Extract text from brackets

up vote
1
down vote

favorite

I have a file like this:

input file:

Evigen1000005_c0_g1_i1  0.240   1.212   1.408   3.784   2.029   0.963   -1.22409810298695       1       NA      NA      NA      NA      PF04597.13;Ribophorin_I;4.6e-148        NA      1;21;0.875      len=569;ExpAA=32.49;First60=12.82;PredHel=1;Topology=o433-450i     Q9SFX3  OST1A_ARATH     reviewed        Dolichyl-diphosphooligosaccharide--protein_glycosyltransferase_subunit_1A_(EC_2.4.99.18)_(Ribophorin_IA)_(RPN-IA)_(Ribophorin-1A)  OST1A_RPN1A_At1g76400_F15M4.10  Arabidopsis_thaliana_(Mouse-ear_cress)  614     protein_N-linked_glycosylation_via_asparagine_[GO:0018279]      endoplasmic_reticulum_[GO:0005783];_integral_component_of_membrane_[GO:0016021];_membrane_[GO:0016020];_oligosaccharyltransferase_complex_[GO:0008250]     dolichyl-diphosphooligosaccharide-protein_glycotransferase_activity_[GO:0004579]        3702.AT1G76400.1;  PF04597;        IPR007676;      3702    ath:AT1G76400;  F15M4.10        2.4.99.18       SUBCELLULAR_LOCATION:_Endoplasmic_reticulum_membrane_{ECO:0000250};_Single-pass_type_I_membrane_protein_{ECO:0000250}.  SIGNAL_1_25_{ECO:0000255}. AT1G76400.1;    NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA 

Evigen1000006_c0_g1_i1  0.358   0.179   0.000   0.424   0.139   0.183   NA      NA      NA      NA      NA      NA      PF07767.10;Nop53_(60S_ribosomal_biogenesis);5.2e-21     NA      1;31;0.588      len=170;ExpAA=14.33;First60=14.27;PredHel=0;Topology=o     O22892  NOP53_ARATH     reviewed        Ribosome_biogenesis_protein_NOP53       At2g40430_T2P4.22       Arabidopsis_thaliana_(Mouse-ear_cress)  442     ribosomal_large_subunit_assembly_[GO:0000027];_ribosomal_large_subunit_export_from_nucleus_[GO:0000055]    nucleolus_[GO:0005730];_nucleoplasm_[GO:0005654]        rRNA_binding_[GO:0019843]       3702.AT2G40430.2;       PF07767;   IPR011687;      3702    ath:AT2G40430;  T2P4.22 SUBCELLULAR_LOCATION:_Nucleus,_nucleolus_{ECO:0000250|UniProtKB:Q9NZM5}._Nucleus,_nucleoplasm_{ECO:0000250|UniProtKB:Q9NZM5}.   AT2G40430.1_[O22892-1]; NA NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA

I want to take out the text form the brackets which only starts with "GO:". After each GO: there are 7 digits. e.g. "GO:0018279". They are GO terms. The number of GO terms in each row are not equal. The output must be a file which the first column includes the Untranscript ids (e.g. Evigen1000005_c0_g1_i1) and the rest GO terms. I want an output file like this:

output file:

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843

edited Nov 25 at 9:49

asked Nov 23 at 22:09

Mehdi

196

Should [GO:0016021] and [O22892-1] be excluded from the output? In that case why? Please specify in detail how to select the output!
– sudodus
Nov 24 at 0:22

yes, it should excluded
– Mehdi
Nov 24 at 7:06

If you explain why to exclude these data (what makes them 'wrong'), it should be possible to design a method to exclude them automatically. Otherwise we can only guess or leave the exclusion to manual methods.
– sudodus
Nov 24 at 10:54

1

the text presented in brackets started with GO: were Gene Ontology for each Unitranscrit (e.g. Evigen1000005_c0_g1_i1), to do GO category with WEGO tool we need a file which the first column is Untranscript ids and the rest GO terms.
– Mehdi
Nov 24 at 11:34

1

no, it should not be extended. there a mistake in the output file. I edit the output file.
– Mehdi
Nov 24 at 17:30

|
show 2 more comments

up vote
1
down vote

favorite

I have a file like this:

input file:

Evigen1000005_c0_g1_i1  0.240   1.212   1.408   3.784   2.029   0.963   -1.22409810298695       1       NA      NA      NA      NA      PF04597.13;Ribophorin_I;4.6e-148        NA      1;21;0.875      len=569;ExpAA=32.49;First60=12.82;PredHel=1;Topology=o433-450i     Q9SFX3  OST1A_ARATH     reviewed        Dolichyl-diphosphooligosaccharide--protein_glycosyltransferase_subunit_1A_(EC_2.4.99.18)_(Ribophorin_IA)_(RPN-IA)_(Ribophorin-1A)  OST1A_RPN1A_At1g76400_F15M4.10  Arabidopsis_thaliana_(Mouse-ear_cress)  614     protein_N-linked_glycosylation_via_asparagine_[GO:0018279]      endoplasmic_reticulum_[GO:0005783];_integral_component_of_membrane_[GO:0016021];_membrane_[GO:0016020];_oligosaccharyltransferase_complex_[GO:0008250]     dolichyl-diphosphooligosaccharide-protein_glycotransferase_activity_[GO:0004579]        3702.AT1G76400.1;  PF04597;        IPR007676;      3702    ath:AT1G76400;  F15M4.10        2.4.99.18       SUBCELLULAR_LOCATION:_Endoplasmic_reticulum_membrane_{ECO:0000250};_Single-pass_type_I_membrane_protein_{ECO:0000250}.  SIGNAL_1_25_{ECO:0000255}. AT1G76400.1;    NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA 

Evigen1000006_c0_g1_i1  0.358   0.179   0.000   0.424   0.139   0.183   NA      NA      NA      NA      NA      NA      PF07767.10;Nop53_(60S_ribosomal_biogenesis);5.2e-21     NA      1;31;0.588      len=170;ExpAA=14.33;First60=14.27;PredHel=0;Topology=o     O22892  NOP53_ARATH     reviewed        Ribosome_biogenesis_protein_NOP53       At2g40430_T2P4.22       Arabidopsis_thaliana_(Mouse-ear_cress)  442     ribosomal_large_subunit_assembly_[GO:0000027];_ribosomal_large_subunit_export_from_nucleus_[GO:0000055]    nucleolus_[GO:0005730];_nucleoplasm_[GO:0005654]        rRNA_binding_[GO:0019843]       3702.AT2G40430.2;       PF07767;   IPR011687;      3702    ath:AT2G40430;  T2P4.22 SUBCELLULAR_LOCATION:_Nucleus,_nucleolus_{ECO:0000250|UniProtKB:Q9NZM5}._Nucleus,_nucleoplasm_{ECO:0000250|UniProtKB:Q9NZM5}.   AT2G40430.1_[O22892-1]; NA NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA

output file:

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843

edited Nov 25 at 9:49

asked Nov 23 at 22:09

Mehdi

196

Should [GO:0016021] and [O22892-1] be excluded from the output? In that case why? Please specify in detail how to select the output!
– sudodus
Nov 24 at 0:22

yes, it should excluded
– Mehdi
Nov 24 at 7:06

If you explain why to exclude these data (what makes them 'wrong'), it should be possible to design a method to exclude them automatically. Otherwise we can only guess or leave the exclusion to manual methods.
– sudodus
Nov 24 at 10:54

1

the text presented in brackets started with GO: were Gene Ontology for each Unitranscrit (e.g. Evigen1000005_c0_g1_i1), to do GO category with WEGO tool we need a file which the first column is Untranscript ids and the rest GO terms.
– Mehdi
Nov 24 at 11:34

1

no, it should not be extended. there a mistake in the output file. I edit the output file.
– Mehdi
Nov 24 at 17:30

|
show 2 more comments

up vote
1
down vote

favorite

I have a file like this:

input file:

Evigen1000005_c0_g1_i1  0.240   1.212   1.408   3.784   2.029   0.963   -1.22409810298695       1       NA      NA      NA      NA      PF04597.13;Ribophorin_I;4.6e-148        NA      1;21;0.875      len=569;ExpAA=32.49;First60=12.82;PredHel=1;Topology=o433-450i     Q9SFX3  OST1A_ARATH     reviewed        Dolichyl-diphosphooligosaccharide--protein_glycosyltransferase_subunit_1A_(EC_2.4.99.18)_(Ribophorin_IA)_(RPN-IA)_(Ribophorin-1A)  OST1A_RPN1A_At1g76400_F15M4.10  Arabidopsis_thaliana_(Mouse-ear_cress)  614     protein_N-linked_glycosylation_via_asparagine_[GO:0018279]      endoplasmic_reticulum_[GO:0005783];_integral_component_of_membrane_[GO:0016021];_membrane_[GO:0016020];_oligosaccharyltransferase_complex_[GO:0008250]     dolichyl-diphosphooligosaccharide-protein_glycotransferase_activity_[GO:0004579]        3702.AT1G76400.1;  PF04597;        IPR007676;      3702    ath:AT1G76400;  F15M4.10        2.4.99.18       SUBCELLULAR_LOCATION:_Endoplasmic_reticulum_membrane_{ECO:0000250};_Single-pass_type_I_membrane_protein_{ECO:0000250}.  SIGNAL_1_25_{ECO:0000255}. AT1G76400.1;    NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA 

Evigen1000006_c0_g1_i1  0.358   0.179   0.000   0.424   0.139   0.183   NA      NA      NA      NA      NA      NA      PF07767.10;Nop53_(60S_ribosomal_biogenesis);5.2e-21     NA      1;31;0.588      len=170;ExpAA=14.33;First60=14.27;PredHel=0;Topology=o     O22892  NOP53_ARATH     reviewed        Ribosome_biogenesis_protein_NOP53       At2g40430_T2P4.22       Arabidopsis_thaliana_(Mouse-ear_cress)  442     ribosomal_large_subunit_assembly_[GO:0000027];_ribosomal_large_subunit_export_from_nucleus_[GO:0000055]    nucleolus_[GO:0005730];_nucleoplasm_[GO:0005654]        rRNA_binding_[GO:0019843]       3702.AT2G40430.2;       PF07767;   IPR011687;      3702    ath:AT2G40430;  T2P4.22 SUBCELLULAR_LOCATION:_Nucleus,_nucleolus_{ECO:0000250|UniProtKB:Q9NZM5}._Nucleus,_nucleoplasm_{ECO:0000250|UniProtKB:Q9NZM5}.   AT2G40430.1_[O22892-1]; NA NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA

output file:

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843

edited Nov 25 at 9:49

asked Nov 23 at 22:09

Mehdi

196

I have a file like this:

input file:

Evigen1000005_c0_g1_i1  0.240   1.212   1.408   3.784   2.029   0.963   -1.22409810298695       1       NA      NA      NA      NA      PF04597.13;Ribophorin_I;4.6e-148        NA      1;21;0.875      len=569;ExpAA=32.49;First60=12.82;PredHel=1;Topology=o433-450i     Q9SFX3  OST1A_ARATH     reviewed        Dolichyl-diphosphooligosaccharide--protein_glycosyltransferase_subunit_1A_(EC_2.4.99.18)_(Ribophorin_IA)_(RPN-IA)_(Ribophorin-1A)  OST1A_RPN1A_At1g76400_F15M4.10  Arabidopsis_thaliana_(Mouse-ear_cress)  614     protein_N-linked_glycosylation_via_asparagine_[GO:0018279]      endoplasmic_reticulum_[GO:0005783];_integral_component_of_membrane_[GO:0016021];_membrane_[GO:0016020];_oligosaccharyltransferase_complex_[GO:0008250]     dolichyl-diphosphooligosaccharide-protein_glycotransferase_activity_[GO:0004579]        3702.AT1G76400.1;  PF04597;        IPR007676;      3702    ath:AT1G76400;  F15M4.10        2.4.99.18       SUBCELLULAR_LOCATION:_Endoplasmic_reticulum_membrane_{ECO:0000250};_Single-pass_type_I_membrane_protein_{ECO:0000250}.  SIGNAL_1_25_{ECO:0000255}. AT1G76400.1;    NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA 

Evigen1000006_c0_g1_i1  0.358   0.179   0.000   0.424   0.139   0.183   NA      NA      NA      NA      NA      NA      PF07767.10;Nop53_(60S_ribosomal_biogenesis);5.2e-21     NA      1;31;0.588      len=170;ExpAA=14.33;First60=14.27;PredHel=0;Topology=o     O22892  NOP53_ARATH     reviewed        Ribosome_biogenesis_protein_NOP53       At2g40430_T2P4.22       Arabidopsis_thaliana_(Mouse-ear_cress)  442     ribosomal_large_subunit_assembly_[GO:0000027];_ribosomal_large_subunit_export_from_nucleus_[GO:0000055]    nucleolus_[GO:0005730];_nucleoplasm_[GO:0005654]        rRNA_binding_[GO:0019843]       3702.AT2G40430.2;       PF07767;   IPR011687;      3702    ath:AT2G40430;  T2P4.22 SUBCELLULAR_LOCATION:_Nucleus,_nucleolus_{ECO:0000250|UniProtKB:Q9NZM5}._Nucleus,_nucleoplasm_{ECO:0000250|UniProtKB:Q9NZM5}.   AT2G40430.1_[O22892-1]; NA NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA

output file:

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843

shell-script awk sed grep

edited Nov 25 at 9:49

asked Nov 23 at 22:09

Mehdi

196

edited Nov 25 at 9:49

asked Nov 23 at 22:09

Mehdi

196

edited Nov 25 at 9:49

asked Nov 23 at 22:09

Mehdi

196

asked Nov 23 at 22:09

Mehdi

196

asked Nov 23 at 22:09

Mehdi

196

Should [GO:0016021] and [O22892-1] be excluded from the output? In that case why? Please specify in detail how to select the output!
– sudodus
Nov 24 at 0:22

yes, it should excluded
– Mehdi
Nov 24 at 7:06

If you explain why to exclude these data (what makes them 'wrong'), it should be possible to design a method to exclude them automatically. Otherwise we can only guess or leave the exclusion to manual methods.
– sudodus
Nov 24 at 10:54

1

the text presented in brackets started with GO: were Gene Ontology for each Unitranscrit (e.g. Evigen1000005_c0_g1_i1), to do GO category with WEGO tool we need a file which the first column is Untranscript ids and the rest GO terms.
– Mehdi
Nov 24 at 11:34

1

no, it should not be extended. there a mistake in the output file. I edit the output file.
– Mehdi
Nov 24 at 17:30

|
show 2 more comments

Should [GO:0016021] and [O22892-1] be excluded from the output? In that case why? Please specify in detail how to select the output!
– sudodus
Nov 24 at 0:22

yes, it should excluded
– Mehdi
Nov 24 at 7:06

If you explain why to exclude these data (what makes them 'wrong'), it should be possible to design a method to exclude them automatically. Otherwise we can only guess or leave the exclusion to manual methods.
– sudodus
Nov 24 at 10:54

1

the text presented in brackets started with GO: were Gene Ontology for each Unitranscrit (e.g. Evigen1000005_c0_g1_i1), to do GO category with WEGO tool we need a file which the first column is Untranscript ids and the rest GO terms.
– Mehdi
Nov 24 at 11:34

1

no, it should not be extended. there a mistake in the output file. I edit the output file.
– Mehdi
Nov 24 at 17:30

Should [GO:0016021] and [O22892-1] be excluded from the output? In that case why? Please specify in detail how to select the output!
– sudodus
Nov 24 at 0:22

yes, it should excluded
– Mehdi
Nov 24 at 7:06

If you explain why to exclude these data (what makes them 'wrong'), it should be possible to design a method to exclude them automatically. Otherwise we can only guess or leave the exclusion to manual methods.
– sudodus
Nov 24 at 10:54

the text presented in brackets started with GO: were Gene Ontology for each Unitranscrit (e.g. Evigen1000005_c0_g1_i1), to do GO category with WEGO tool we need a file which the first column is Untranscript ids and the rest GO terms.
– Mehdi
Nov 24 at 11:34

no, it should not be extended. there a mistake in the output file. I edit the output file.
– Mehdi
Nov 24 at 17:30

|
show 2 more comments

3 Answers
3

active

oldest

votes

up vote
0
down vote

accepted

Suggested script, that matches your final specification.

#!/bin/bash



while read line

do

# echo "$line"

 name=${line%% *}

 echo -n "$name "

 data=$(<<< "$line" grep -o '[GO:.{7}]' | tr 'n' ' ' | sed -e 's/[//g' -e 's/]//g')

 echo "$data"

done < "$1"

Testing:

$ ./script input 

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579 

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843

edited Nov 24 at 18:50

answered Nov 23 at 23:55

sudodus

58116

That is great, it works. Thanks
– Mehdi
Nov 24 at 9:50

@Mehdi, You are welcome :-)
– sudodus
Nov 24 at 11:14

add a comment |

up vote
1
down vote

How about

sed -r 's/(^[^[:space:]]* )[^*[/1/; s/][^*([|$)/ /g' file

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579 

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843 O22892-1

Your desired output does NOT reflect the processed input sample.

EDIT: or even

sed -r 's/((^[^ ]* )|])[^*([|$)/2 /g' file

EDIT: with your question and desired output three times revised, try

sed -r 's/((^[^ ]* )|])[^*([GO)/2 GO/g; s/].*$//' file

edited Nov 25 at 22:16

answered Nov 23 at 22:51

RudiC

3,4221312

add a comment |

up vote
0
down vote

Faster in sed:

start='[GO:' end=']'

sed -e 's,'"$start"$',1,g' -e 's,'"$end"$',2,g' 

    -e $'s, [^1]*, ,' -e $'s,1\([^2]*\)2[^1]*,GO:\1 ,g' 

    infile

or awk:

awk -vone=$'1' -vtwo=$'3' -vstart='[GO:' -v end=']' '

           {

            printf("%s ",$1);

            gsub(start,one);

            gsub(end,two);

            sub("^[^"one"]*"one,"GO:")

            gsub(two"[^"one"]*"one," GO:")

            sub(two".*$" ,"")

           }

           1' infile

answered Nov 26 at 0:48

Isaac

9,91111445

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f483786%2fextract-text-from-brackets%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
0
down vote

accepted

Suggested script, that matches your final specification.

#!/bin/bash



while read line

do

# echo "$line"

 name=${line%% *}

 echo -n "$name "

 data=$(<<< "$line" grep -o '[GO:.{7}]' | tr 'n' ' ' | sed -e 's/[//g' -e 's/]//g')

 echo "$data"

done < "$1"

Testing:

$ ./script input 

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579 

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843

edited Nov 24 at 18:50

answered Nov 23 at 23:55

sudodus

58116

That is great, it works. Thanks
– Mehdi
Nov 24 at 9:50

@Mehdi, You are welcome :-)
– sudodus
Nov 24 at 11:14

add a comment |

up vote
0
down vote

accepted

Suggested script, that matches your final specification.

#!/bin/bash



while read line

do

# echo "$line"

 name=${line%% *}

 echo -n "$name "

 data=$(<<< "$line" grep -o '[GO:.{7}]' | tr 'n' ' ' | sed -e 's/[//g' -e 's/]//g')

 echo "$data"

done < "$1"

Testing:

$ ./script input 

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579 

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843

edited Nov 24 at 18:50

answered Nov 23 at 23:55

sudodus

58116

That is great, it works. Thanks
– Mehdi
Nov 24 at 9:50

@Mehdi, You are welcome :-)
– sudodus
Nov 24 at 11:14

add a comment |

up vote
0
down vote

accepted

Suggested script, that matches your final specification.

#!/bin/bash



while read line

do

# echo "$line"

 name=${line%% *}

 echo -n "$name "

 data=$(<<< "$line" grep -o '[GO:.{7}]' | tr 'n' ' ' | sed -e 's/[//g' -e 's/]//g')

 echo "$data"

done < "$1"

Testing:

$ ./script input 

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579 

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843

edited Nov 24 at 18:50

answered Nov 23 at 23:55

sudodus

58116

Suggested script, that matches your final specification.

#!/bin/bash



while read line

do

# echo "$line"

 name=${line%% *}

 echo -n "$name "

 data=$(<<< "$line" grep -o '[GO:.{7}]' | tr 'n' ' ' | sed -e 's/[//g' -e 's/]//g')

 echo "$data"

done < "$1"

Testing:

$ ./script input 

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579 

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843

edited Nov 24 at 18:50

answered Nov 23 at 23:55

sudodus

58116

edited Nov 24 at 18:50

answered Nov 23 at 23:55

sudodus

58116

answered Nov 23 at 23:55

sudodus

58116

answered Nov 23 at 23:55

sudodus

58116

That is great, it works. Thanks
– Mehdi
Nov 24 at 9:50

@Mehdi, You are welcome :-)
– sudodus
Nov 24 at 11:14

add a comment |

That is great, it works. Thanks
– Mehdi
Nov 24 at 9:50

@Mehdi, You are welcome :-)
– sudodus
Nov 24 at 11:14

That is great, it works. Thanks
– Mehdi
Nov 24 at 9:50

@Mehdi, You are welcome :-)
– sudodus
Nov 24 at 11:14

add a comment |

up vote
1
down vote

How about

sed -r 's/(^[^[:space:]]* )[^*[/1/; s/][^*([|$)/ /g' file

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579 

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843 O22892-1

Your desired output does NOT reflect the processed input sample.

EDIT: or even

sed -r 's/((^[^ ]* )|])[^*([|$)/2 /g' file

EDIT: with your question and desired output three times revised, try

sed -r 's/((^[^ ]* )|])[^*([GO)/2 GO/g; s/].*$//' file

edited Nov 25 at 22:16

answered Nov 23 at 22:51

RudiC

3,4221312

add a comment |

up vote
1
down vote

How about

sed -r 's/(^[^[:space:]]* )[^*[/1/; s/][^*([|$)/ /g' file

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579 

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843 O22892-1

Your desired output does NOT reflect the processed input sample.

EDIT: or even

sed -r 's/((^[^ ]* )|])[^*([|$)/2 /g' file

EDIT: with your question and desired output three times revised, try

sed -r 's/((^[^ ]* )|])[^*([GO)/2 GO/g; s/].*$//' file

edited Nov 25 at 22:16

answered Nov 23 at 22:51

RudiC

3,4221312

add a comment |

up vote
1
down vote

How about

sed -r 's/(^[^[:space:]]* )[^*[/1/; s/][^*([|$)/ /g' file

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579 

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843 O22892-1

Your desired output does NOT reflect the processed input sample.

EDIT: or even

sed -r 's/((^[^ ]* )|])[^*([|$)/2 /g' file

EDIT: with your question and desired output three times revised, try

sed -r 's/((^[^ ]* )|])[^*([GO)/2 GO/g; s/].*$//' file

edited Nov 25 at 22:16

answered Nov 23 at 22:51

RudiC

3,4221312

How about

sed -r 's/(^[^[:space:]]* )[^*[/1/; s/][^*([|$)/ /g' file

Evigen1000005_c0_g1_i1 GO:0018279 GO:0005783 GO:0016021 GO:0016020 GO:0008250 GO:0004579 

Evigen1000006_c0_g1_i1 GO:0000027 GO:0000055 GO:0005730 GO:0005654 GO:0019843 O22892-1

Your desired output does NOT reflect the processed input sample.

EDIT: or even

sed -r 's/((^[^ ]* )|])[^*([|$)/2 /g' file

EDIT: with your question and desired output three times revised, try

sed -r 's/((^[^ ]* )|])[^*([GO)/2 GO/g; s/].*$//' file

edited Nov 25 at 22:16

answered Nov 23 at 22:51

RudiC

3,4221312

edited Nov 25 at 22:16

answered Nov 23 at 22:51

RudiC

3,4221312

answered Nov 23 at 22:51

RudiC

3,4221312

answered Nov 23 at 22:51

RudiC

3,4221312

add a comment |

up vote
0
down vote

Faster in sed:

start='[GO:' end=']'

sed -e 's,'"$start"$',1,g' -e 's,'"$end"$',2,g' 

    -e $'s, [^1]*, ,' -e $'s,1\([^2]*\)2[^1]*,GO:\1 ,g' 

    infile

or awk:

awk -vone=$'1' -vtwo=$'3' -vstart='[GO:' -v end=']' '

           {

            printf("%s ",$1);

            gsub(start,one);

            gsub(end,two);

            sub("^[^"one"]*"one,"GO:")

            gsub(two"[^"one"]*"one," GO:")

            sub(two".*$" ,"")

           }

           1' infile

answered Nov 26 at 0:48

Isaac

9,91111445

add a comment |

up vote
0
down vote

Faster in sed:

start='[GO:' end=']'

sed -e 's,'"$start"$',1,g' -e 's,'"$end"$',2,g' 

    -e $'s, [^1]*, ,' -e $'s,1\([^2]*\)2[^1]*,GO:\1 ,g' 

    infile

or awk:

awk -vone=$'1' -vtwo=$'3' -vstart='[GO:' -v end=']' '

           {

            printf("%s ",$1);

            gsub(start,one);

            gsub(end,two);

            sub("^[^"one"]*"one,"GO:")

            gsub(two"[^"one"]*"one," GO:")

            sub(two".*$" ,"")

           }

           1' infile

answered Nov 26 at 0:48

Isaac

9,91111445

add a comment |

up vote
0
down vote

Faster in sed:

start='[GO:' end=']'

sed -e 's,'"$start"$',1,g' -e 's,'"$end"$',2,g' 

    -e $'s, [^1]*, ,' -e $'s,1\([^2]*\)2[^1]*,GO:\1 ,g' 

    infile

or awk:

awk -vone=$'1' -vtwo=$'3' -vstart='[GO:' -v end=']' '

           {

            printf("%s ",$1);

            gsub(start,one);

            gsub(end,two);

            sub("^[^"one"]*"one,"GO:")

            gsub(two"[^"one"]*"one," GO:")

            sub(two".*$" ,"")

           }

           1' infile

answered Nov 26 at 0:48

Isaac

9,91111445

Faster in sed:

start='[GO:' end=']'

sed -e 's,'"$start"$',1,g' -e 's,'"$end"$',2,g' 

    -e $'s, [^1]*, ,' -e $'s,1\([^2]*\)2[^1]*,GO:\1 ,g' 

    infile

or awk:

awk -vone=$'1' -vtwo=$'3' -vstart='[GO:' -v end=']' '

           {

            printf("%s ",$1);

            gsub(start,one);

            gsub(end,two);

            sub("^[^"one"]*"one,"GO:")

            gsub(two"[^"one"]*"one," GO:")

            sub(two".*$" ,"")

           }

           1' infile

answered Nov 26 at 0:48

Isaac

9,91111445

answered Nov 26 at 0:48

Isaac

9,91111445

answered Nov 26 at 0:48

Isaac

9,91111445

answered Nov 26 at 0:48

Isaac

9,91111445

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk

Extract text from brackets

input file:

output file:

input file:

output file:

input file:

output file:

input file:

output file:

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Popular posts from this blog

Scott Moir

Województwo

What dialect is “You wants I should do it for ya?”

Extract text from brackets

input file:

output file:

input file:

output file:

input file:

output file:

input file:

output file:

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Scott Moir

Województwo

What dialect is “You wants I should do it for ya?”

3 Answers
3

3 Answers
3

3 Answers
3