Add lines to files to make them equal length
up vote
4
down vote
favorite
I have a bunch of .csv files with N columns and different number of rows (lines). I would like to add as many empty lines ;...; (N semicolons) to make them the same length. I can get the length of the longest file manually but it would also be good to get this done automatically.
For example:
I have,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
I need,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
;;;;;
;;;;;
;;;;;
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
;;;;;
;;;;;
shell-script text-processing awk files csv
add a comment |
up vote
4
down vote
favorite
I have a bunch of .csv files with N columns and different number of rows (lines). I would like to add as many empty lines ;...; (N semicolons) to make them the same length. I can get the length of the longest file manually but it would also be good to get this done automatically.
For example:
I have,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
I need,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
;;;;;
;;;;;
;;;;;
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
;;;;;
;;;;;
shell-script text-processing awk files csv
1
A simple (but probably not optimal) way to do it would be to usewcto count the line count of each file to find the max. You can thenecho ";;;;" >> filein each file until the line count reach the max.
– Bear'sBeard
yesterday
1
Why do you want the files to have the same number of lines? Maybe there is a good method, where you can use the files as they are (with their different number of lines).
– sudodus
yesterday
@Bear'sBeard Yep, something like that did it, I was looking for a more compact way.
– myradio
yesterday
@sudodus Well, there're people before and after me in the pipeline, things must match certain formats...
– myradio
yesterday
add a comment |
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I have a bunch of .csv files with N columns and different number of rows (lines). I would like to add as many empty lines ;...; (N semicolons) to make them the same length. I can get the length of the longest file manually but it would also be good to get this done automatically.
For example:
I have,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
I need,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
;;;;;
;;;;;
;;;;;
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
;;;;;
;;;;;
shell-script text-processing awk files csv
I have a bunch of .csv files with N columns and different number of rows (lines). I would like to add as many empty lines ;...; (N semicolons) to make them the same length. I can get the length of the longest file manually but it would also be good to get this done automatically.
For example:
I have,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
I need,
file1.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
171; pep; 73; 22:26:10; 3; 72
;;;;;
;;;;;
;;;;;
file2.csv
128; pep; 93; 22:22:10; 3; 11
127; qep; 93; 12:52:10; 3; 15
121; fng; 96; 09:42:10; 3; 52
141; gep; 53; 21:22:10; 3; 62
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
file3.csv
121; fng; 96; 09:42:10; 3; 52
171; pep; 73; 22:26:10; 3; 72
221; ahp; 93; 23:52:10; 3; 892
141; gep; 53; 21:22:10; 3; 62
;;;;;
;;;;;
shell-script text-processing awk files csv
shell-script text-processing awk files csv
edited yesterday
Jeff Schaller
37.1k1052121
37.1k1052121
asked yesterday
myradio
2459
2459
1
A simple (but probably not optimal) way to do it would be to usewcto count the line count of each file to find the max. You can thenecho ";;;;" >> filein each file until the line count reach the max.
– Bear'sBeard
yesterday
1
Why do you want the files to have the same number of lines? Maybe there is a good method, where you can use the files as they are (with their different number of lines).
– sudodus
yesterday
@Bear'sBeard Yep, something like that did it, I was looking for a more compact way.
– myradio
yesterday
@sudodus Well, there're people before and after me in the pipeline, things must match certain formats...
– myradio
yesterday
add a comment |
1
A simple (but probably not optimal) way to do it would be to usewcto count the line count of each file to find the max. You can thenecho ";;;;" >> filein each file until the line count reach the max.
– Bear'sBeard
yesterday
1
Why do you want the files to have the same number of lines? Maybe there is a good method, where you can use the files as they are (with their different number of lines).
– sudodus
yesterday
@Bear'sBeard Yep, something like that did it, I was looking for a more compact way.
– myradio
yesterday
@sudodus Well, there're people before and after me in the pipeline, things must match certain formats...
– myradio
yesterday
1
1
A simple (but probably not optimal) way to do it would be to use
wc to count the line count of each file to find the max. You can then echo ";;;;" >> file in each file until the line count reach the max.– Bear'sBeard
yesterday
A simple (but probably not optimal) way to do it would be to use
wc to count the line count of each file to find the max. You can then echo ";;;;" >> file in each file until the line count reach the max.– Bear'sBeard
yesterday
1
1
Why do you want the files to have the same number of lines? Maybe there is a good method, where you can use the files as they are (with their different number of lines).
– sudodus
yesterday
Why do you want the files to have the same number of lines? Maybe there is a good method, where you can use the files as they are (with their different number of lines).
– sudodus
yesterday
@Bear'sBeard Yep, something like that did it, I was looking for a more compact way.
– myradio
yesterday
@Bear'sBeard Yep, something like that did it, I was looking for a more compact way.
– myradio
yesterday
@sudodus Well, there're people before and after me in the pipeline, things must match certain formats...
– myradio
yesterday
@sudodus Well, there're people before and after me in the pipeline, things must match certain formats...
– myradio
yesterday
add a comment |
3 Answers
3
active
oldest
votes
up vote
3
down vote
Thanks @Sparhawk for the suggestions in the comments, I update based on those,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in files*pattern.txt;do
lineNumber=$(wc -l < $name)
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
Well, not elegand nor efficient. Actually, it takes a couple of seconds which sounds an eternity given the small size of the data. Nevertheless it works,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in $(ls files*pattern.txt);do
lineNumber=$(cat $name | wc -l )
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
I just put this file together in the directory where I have the files provided that there is a pattern I can use to list them with files*pattern.txt
1
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just usefor name in files*pattern.txt; doinstead.
– Sparhawk
yesterday
1
And while I'm nit-picking, there's a "useless use of cat" there too. Just dowc -l $name
– Sparhawk
yesterday
2
@Sparhawk: I think you meantwc -l < $name
– Thor
yesterday
@Thor No?wc [OPTION]... [FILE]...works too, as per theman. In fact, this script uses this construction in an earlier line.
– Sparhawk
yesterday
@Sparhawk: Sure, but if you wanted the equivalent output ofcat file | wc -lredirection is the way to go.
– Thor
17 hours ago
|
show 1 more comment
up vote
2
down vote
An improvement of @myradio's answer.
The part inside the loop written in awk which should be much faster.
max=$(wc -l file*.csv | sed '$ d' | sort -n | tail -n1 | awk '{print $1}' )
for f in file*.csv; do
awk -F';' -v max=$max
'END{
s=sprintf("%*s",FS,"");
gsub(/ /,"-",s);
for(i=NR;i<max;i++)
print s;
}' "$f" >> "$f"
done
With -F you set the correct field separator of your files (here -F';').
The s=sprintf();gsub(); part dynamically sets the right amount of the FS (= field separator) (via).
You could simply replace that with print ";;;;;" or other static content if you like.
I like this solution. It's certainly harder to read but is good that had a dynamicFS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem,awkcomplains aboutgensubbeing undefined. Isgensubmaybe on gawk instead?
– myradio
16 hours ago
yeah that seems to be GNU Awk. I replaced it with thegsubsolution from the linked answer.
– RoVo
15 hours ago
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
15 hours ago
add a comment |
up vote
1
down vote
In order to count the lines in each file only once:
wc -l *csv |sort -nr| sed 1d | {
read max file
pad=$(sed q "$file"|tr -cd ";") # extract separators from first record
while read lines file ; do
while [ $((lines+=1)) -le $max ] ; do
echo "$pad" >> "$file"
done
done
}
Note that any newlines in the filenames will cause problems for both sort and the while read loop, but they can handle filenames containing normal spaces.
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
Thanks @Sparhawk for the suggestions in the comments, I update based on those,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in files*pattern.txt;do
lineNumber=$(wc -l < $name)
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
Well, not elegand nor efficient. Actually, it takes a couple of seconds which sounds an eternity given the small size of the data. Nevertheless it works,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in $(ls files*pattern.txt);do
lineNumber=$(cat $name | wc -l )
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
I just put this file together in the directory where I have the files provided that there is a pattern I can use to list them with files*pattern.txt
1
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just usefor name in files*pattern.txt; doinstead.
– Sparhawk
yesterday
1
And while I'm nit-picking, there's a "useless use of cat" there too. Just dowc -l $name
– Sparhawk
yesterday
2
@Sparhawk: I think you meantwc -l < $name
– Thor
yesterday
@Thor No?wc [OPTION]... [FILE]...works too, as per theman. In fact, this script uses this construction in an earlier line.
– Sparhawk
yesterday
@Sparhawk: Sure, but if you wanted the equivalent output ofcat file | wc -lredirection is the way to go.
– Thor
17 hours ago
|
show 1 more comment
up vote
3
down vote
Thanks @Sparhawk for the suggestions in the comments, I update based on those,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in files*pattern.txt;do
lineNumber=$(wc -l < $name)
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
Well, not elegand nor efficient. Actually, it takes a couple of seconds which sounds an eternity given the small size of the data. Nevertheless it works,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in $(ls files*pattern.txt);do
lineNumber=$(cat $name | wc -l )
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
I just put this file together in the directory where I have the files provided that there is a pattern I can use to list them with files*pattern.txt
1
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just usefor name in files*pattern.txt; doinstead.
– Sparhawk
yesterday
1
And while I'm nit-picking, there's a "useless use of cat" there too. Just dowc -l $name
– Sparhawk
yesterday
2
@Sparhawk: I think you meantwc -l < $name
– Thor
yesterday
@Thor No?wc [OPTION]... [FILE]...works too, as per theman. In fact, this script uses this construction in an earlier line.
– Sparhawk
yesterday
@Sparhawk: Sure, but if you wanted the equivalent output ofcat file | wc -lredirection is the way to go.
– Thor
17 hours ago
|
show 1 more comment
up vote
3
down vote
up vote
3
down vote
Thanks @Sparhawk for the suggestions in the comments, I update based on those,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in files*pattern.txt;do
lineNumber=$(wc -l < $name)
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
Well, not elegand nor efficient. Actually, it takes a couple of seconds which sounds an eternity given the small size of the data. Nevertheless it works,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in $(ls files*pattern.txt);do
lineNumber=$(cat $name | wc -l )
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
I just put this file together in the directory where I have the files provided that there is a pattern I can use to list them with files*pattern.txt
Thanks @Sparhawk for the suggestions in the comments, I update based on those,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in files*pattern.txt;do
lineNumber=$(wc -l < $name)
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
Well, not elegand nor efficient. Actually, it takes a couple of seconds which sounds an eternity given the small size of the data. Nevertheless it works,
#!/bin/bash
emptyLine=;;;;;;;
rr=($(wc -l files*pattern.txt | awk '{print $1}' | sed '$ d'))
max=$(echo "${rr[*]}" | sort -nr | head -n1)
for name in $(ls files*pattern.txt);do
lineNumber=$(cat $name | wc -l )
let missing=max-lineNumber
for((i=0;i<$missing;i++));do
echo $emptyLine >> $name
done
done
I just put this file together in the directory where I have the files provided that there is a pattern I can use to list them with files*pattern.txt
edited 16 hours ago
answered yesterday
myradio
2459
2459
1
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just usefor name in files*pattern.txt; doinstead.
– Sparhawk
yesterday
1
And while I'm nit-picking, there's a "useless use of cat" there too. Just dowc -l $name
– Sparhawk
yesterday
2
@Sparhawk: I think you meantwc -l < $name
– Thor
yesterday
@Thor No?wc [OPTION]... [FILE]...works too, as per theman. In fact, this script uses this construction in an earlier line.
– Sparhawk
yesterday
@Sparhawk: Sure, but if you wanted the equivalent output ofcat file | wc -lredirection is the way to go.
– Thor
17 hours ago
|
show 1 more comment
1
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just usefor name in files*pattern.txt; doinstead.
– Sparhawk
yesterday
1
And while I'm nit-picking, there's a "useless use of cat" there too. Just dowc -l $name
– Sparhawk
yesterday
2
@Sparhawk: I think you meantwc -l < $name
– Thor
yesterday
@Thor No?wc [OPTION]... [FILE]...works too, as per theman. In fact, this script uses this construction in an earlier line.
– Sparhawk
yesterday
@Sparhawk: Sure, but if you wanted the equivalent output ofcat file | wc -lredirection is the way to go.
– Thor
17 hours ago
1
1
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just use
for name in files*pattern.txt; do instead.– Sparhawk
yesterday
Nice one (+1)! Thanks for posting up the solution. One small note, don't parse ls, just use
for name in files*pattern.txt; do instead.– Sparhawk
yesterday
1
1
And while I'm nit-picking, there's a "useless use of cat" there too. Just do
wc -l $name– Sparhawk
yesterday
And while I'm nit-picking, there's a "useless use of cat" there too. Just do
wc -l $name– Sparhawk
yesterday
2
2
@Sparhawk: I think you meant
wc -l < $name– Thor
yesterday
@Sparhawk: I think you meant
wc -l < $name– Thor
yesterday
@Thor No?
wc [OPTION]... [FILE]... works too, as per the man. In fact, this script uses this construction in an earlier line.– Sparhawk
yesterday
@Thor No?
wc [OPTION]... [FILE]... works too, as per the man. In fact, this script uses this construction in an earlier line.– Sparhawk
yesterday
@Sparhawk: Sure, but if you wanted the equivalent output of
cat file | wc -l redirection is the way to go.– Thor
17 hours ago
@Sparhawk: Sure, but if you wanted the equivalent output of
cat file | wc -l redirection is the way to go.– Thor
17 hours ago
|
show 1 more comment
up vote
2
down vote
An improvement of @myradio's answer.
The part inside the loop written in awk which should be much faster.
max=$(wc -l file*.csv | sed '$ d' | sort -n | tail -n1 | awk '{print $1}' )
for f in file*.csv; do
awk -F';' -v max=$max
'END{
s=sprintf("%*s",FS,"");
gsub(/ /,"-",s);
for(i=NR;i<max;i++)
print s;
}' "$f" >> "$f"
done
With -F you set the correct field separator of your files (here -F';').
The s=sprintf();gsub(); part dynamically sets the right amount of the FS (= field separator) (via).
You could simply replace that with print ";;;;;" or other static content if you like.
I like this solution. It's certainly harder to read but is good that had a dynamicFS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem,awkcomplains aboutgensubbeing undefined. Isgensubmaybe on gawk instead?
– myradio
16 hours ago
yeah that seems to be GNU Awk. I replaced it with thegsubsolution from the linked answer.
– RoVo
15 hours ago
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
15 hours ago
add a comment |
up vote
2
down vote
An improvement of @myradio's answer.
The part inside the loop written in awk which should be much faster.
max=$(wc -l file*.csv | sed '$ d' | sort -n | tail -n1 | awk '{print $1}' )
for f in file*.csv; do
awk -F';' -v max=$max
'END{
s=sprintf("%*s",FS,"");
gsub(/ /,"-",s);
for(i=NR;i<max;i++)
print s;
}' "$f" >> "$f"
done
With -F you set the correct field separator of your files (here -F';').
The s=sprintf();gsub(); part dynamically sets the right amount of the FS (= field separator) (via).
You could simply replace that with print ";;;;;" or other static content if you like.
I like this solution. It's certainly harder to read but is good that had a dynamicFS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem,awkcomplains aboutgensubbeing undefined. Isgensubmaybe on gawk instead?
– myradio
16 hours ago
yeah that seems to be GNU Awk. I replaced it with thegsubsolution from the linked answer.
– RoVo
15 hours ago
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
15 hours ago
add a comment |
up vote
2
down vote
up vote
2
down vote
An improvement of @myradio's answer.
The part inside the loop written in awk which should be much faster.
max=$(wc -l file*.csv | sed '$ d' | sort -n | tail -n1 | awk '{print $1}' )
for f in file*.csv; do
awk -F';' -v max=$max
'END{
s=sprintf("%*s",FS,"");
gsub(/ /,"-",s);
for(i=NR;i<max;i++)
print s;
}' "$f" >> "$f"
done
With -F you set the correct field separator of your files (here -F';').
The s=sprintf();gsub(); part dynamically sets the right amount of the FS (= field separator) (via).
You could simply replace that with print ";;;;;" or other static content if you like.
An improvement of @myradio's answer.
The part inside the loop written in awk which should be much faster.
max=$(wc -l file*.csv | sed '$ d' | sort -n | tail -n1 | awk '{print $1}' )
for f in file*.csv; do
awk -F';' -v max=$max
'END{
s=sprintf("%*s",FS,"");
gsub(/ /,"-",s);
for(i=NR;i<max;i++)
print s;
}' "$f" >> "$f"
done
With -F you set the correct field separator of your files (here -F';').
The s=sprintf();gsub(); part dynamically sets the right amount of the FS (= field separator) (via).
You could simply replace that with print ";;;;;" or other static content if you like.
edited 15 hours ago
answered yesterday
RoVo
2,354215
2,354215
I like this solution. It's certainly harder to read but is good that had a dynamicFS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem,awkcomplains aboutgensubbeing undefined. Isgensubmaybe on gawk instead?
– myradio
16 hours ago
yeah that seems to be GNU Awk. I replaced it with thegsubsolution from the linked answer.
– RoVo
15 hours ago
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
15 hours ago
add a comment |
I like this solution. It's certainly harder to read but is good that had a dynamicFS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem,awkcomplains aboutgensubbeing undefined. Isgensubmaybe on gawk instead?
– myradio
16 hours ago
yeah that seems to be GNU Awk. I replaced it with thegsubsolution from the linked answer.
– RoVo
15 hours ago
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
15 hours ago
I like this solution. It's certainly harder to read but is good that had a dynamic
FS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem, awk complains about gensub being undefined. Is gensub maybe on gawk instead?– myradio
16 hours ago
I like this solution. It's certainly harder to read but is good that had a dynamic
FS. Nevertheless, 2 things: 1. About the efficiency, I don't know what your time results mean because this depends on (number and size of) the files. 2. I actually wanted to try this to compare the time results with my version, but I got a problem, awk complains about gensub being undefined. Is gensub maybe on gawk instead?– myradio
16 hours ago
yeah that seems to be GNU Awk. I replaced it with the
gsub solution from the linked answer.– RoVo
15 hours ago
yeah that seems to be GNU Awk. I replaced it with the
gsub solution from the linked answer.– RoVo
15 hours ago
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
15 hours ago
About the time results, I think I misunderstood the statement "it takes a couple of seconds" from your answer to be the time needed to process the example files from your question ... I removed that.
– RoVo
15 hours ago
add a comment |
up vote
1
down vote
In order to count the lines in each file only once:
wc -l *csv |sort -nr| sed 1d | {
read max file
pad=$(sed q "$file"|tr -cd ";") # extract separators from first record
while read lines file ; do
while [ $((lines+=1)) -le $max ] ; do
echo "$pad" >> "$file"
done
done
}
Note that any newlines in the filenames will cause problems for both sort and the while read loop, but they can handle filenames containing normal spaces.
add a comment |
up vote
1
down vote
In order to count the lines in each file only once:
wc -l *csv |sort -nr| sed 1d | {
read max file
pad=$(sed q "$file"|tr -cd ";") # extract separators from first record
while read lines file ; do
while [ $((lines+=1)) -le $max ] ; do
echo "$pad" >> "$file"
done
done
}
Note that any newlines in the filenames will cause problems for both sort and the while read loop, but they can handle filenames containing normal spaces.
add a comment |
up vote
1
down vote
up vote
1
down vote
In order to count the lines in each file only once:
wc -l *csv |sort -nr| sed 1d | {
read max file
pad=$(sed q "$file"|tr -cd ";") # extract separators from first record
while read lines file ; do
while [ $((lines+=1)) -le $max ] ; do
echo "$pad" >> "$file"
done
done
}
Note that any newlines in the filenames will cause problems for both sort and the while read loop, but they can handle filenames containing normal spaces.
In order to count the lines in each file only once:
wc -l *csv |sort -nr| sed 1d | {
read max file
pad=$(sed q "$file"|tr -cd ";") # extract separators from first record
while read lines file ; do
while [ $((lines+=1)) -le $max ] ; do
echo "$pad" >> "$file"
done
done
}
Note that any newlines in the filenames will cause problems for both sort and the while read loop, but they can handle filenames containing normal spaces.
answered yesterday
JigglyNaga
3,529828
3,529828
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f485857%2fadd-lines-to-files-to-make-them-equal-length%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
A simple (but probably not optimal) way to do it would be to use
wcto count the line count of each file to find the max. You can thenecho ";;;;" >> filein each file until the line count reach the max.– Bear'sBeard
yesterday
1
Why do you want the files to have the same number of lines? Maybe there is a good method, where you can use the files as they are (with their different number of lines).
– sudodus
yesterday
@Bear'sBeard Yep, something like that did it, I was looking for a more compact way.
– myradio
yesterday
@sudodus Well, there're people before and after me in the pipeline, things must match certain formats...
– myradio
yesterday