Parallelize a Bash FOR Loop
I have been trying to parallelize the following script, specifically each of the three FOR loop instances, using GNU Parallel but haven't been able to. The 4 commands contained within the FOR loop run in series, each loop taking around 10 minutes.
#!/bin/bash
kar='KAR5'
runList='run2 run3 run4'
mkdir normFunc
for run in $runList
do
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
rm -f *.mat
done
shell-script gnu-parallel
add a comment |
I have been trying to parallelize the following script, specifically each of the three FOR loop instances, using GNU Parallel but haven't been able to. The 4 commands contained within the FOR loop run in series, each loop taking around 10 minutes.
#!/bin/bash
kar='KAR5'
runList='run2 run3 run4'
mkdir normFunc
for run in $runList
do
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
rm -f *.mat
done
shell-script gnu-parallel
add a comment |
I have been trying to parallelize the following script, specifically each of the three FOR loop instances, using GNU Parallel but haven't been able to. The 4 commands contained within the FOR loop run in series, each loop taking around 10 minutes.
#!/bin/bash
kar='KAR5'
runList='run2 run3 run4'
mkdir normFunc
for run in $runList
do
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
rm -f *.mat
done
shell-script gnu-parallel
I have been trying to parallelize the following script, specifically each of the three FOR loop instances, using GNU Parallel but haven't been able to. The 4 commands contained within the FOR loop run in series, each loop taking around 10 minutes.
#!/bin/bash
kar='KAR5'
runList='run2 run3 run4'
mkdir normFunc
for run in $runList
do
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
rm -f *.mat
done
shell-script gnu-parallel
shell-script gnu-parallel
edited Nov 27 '16 at 14:42
Jeff Schaller
38.7k1053125
38.7k1053125
asked Dec 5 '13 at 21:04
Ravnoor S Gill
508154
508154
add a comment |
add a comment |
10 Answers
10
active
oldest
votes
Why don't you just fork (aka. background) them?
foo () {
local run=$1
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}
for run in $runList; do foo "$run" & done
In case that's not clear, the significant part is here:
for run in $runList; do foo "$run" & done
^
Causing the function to be executed in a forked shell in the background. That's parallel.
5
That worked like a charm. Thank you. Such a simple implementation (Makes me feel so stupid now!).
– Ravnoor S Gill
Dec 5 '13 at 21:24
7
In case I had 8 files to run in parallel but only 4 cores, could that be integrated in such a setting or would that require a Job Scheduler?
– Ravnoor S Gill
Dec 5 '13 at 21:27
5
It doesn't really matter in this context; it's normal for the system to have more active processes than cores. If you have many short tasks, ideally you would feed a queue serviced by a number or worker threads < the number of cores. I don't know how often that is really done with shell scripting (in which case, they wouldn't be threads, they'd be independent processes) but with relatively few long tasks it would be pointless. The OS scheduler will take care of them.
– goldilocks
Dec 5 '13 at 21:50
12
You also might want to add await
command at the end so the master script does not exit until all of the background jobs do.
– psusi
Nov 19 '15 at 0:22
1
I would also fine it useful to limit the number of concurrent processes: my processes each use 100% of a core's time for about 25 minutes. This is on a shared server with 16 cores, where many people are running jobs. I need to run 23 copies of the script. If I run them all concurrently, then I swamp the server, and make it useless for everyone else for an hour or two (load goes up to 30, everything else slows way down). I guess it could be done withnice
, but then I don't know if it'd ever finish..
– naught101
Nov 26 '15 at 23:00
|
show 3 more comments
Sample task
task(){
sleep 0.5; echo "$1";
}
Sequential runs
for thing in a b c d e f g; do
task "$thing"
done
Parallel runs
for thing in a b c d e f g; do
task "$thing" &
done
Parallel runs in N-process batches
N=4
(
for thing in a b c d e f g; do
((i=i%N)); ((i++==0)) && wait
task "$thing" &
done
)
It's also possible to use FIFOs as semaphores and use them to ensure that new processes are spawned as soon as possible and that no more than N processes runs at the same time. But it requires more code.
N processes with a FIFO-based semaphore:
open_sem(){
mkfifo pipe-$$
exec 3<>pipe-$$
rm pipe-$$
local i=$1
for((;i>0;i--)); do
printf %s 000 >&3
done
}
run_with_lock(){
local x
read -u 3 -n 3 x && ((0==x)) || exit $x
(
( "$@"; )
printf '%.3d' $? >&3
)&
}
N=4
open_sem $N
for thing in {a..g}; do
run_with_lock task $thing
done
3
The line withwait
in it basically lets all processes run, until it hits thenth
process, then waits for all of the others to finish running, is that right?
– naught101
Nov 26 '15 at 23:03
Ifi
is zero, call wait. Incrementi
after the zero test.
– PSkocik
Nov 26 '15 at 23:08
1
Love the n parallel runs! Thank you.
– joshperry
Sep 15 '16 at 16:31
1
@naught101 Yes.wait
w/ no arg waits for all children. That makes it a little wasteful. The pipe-based-semaphore approach gives you more fluent concurrency (I've been using that in a custom shell based build system along with-nt
/-ot
checks successfully for a while now)
– PSkocik
Mar 10 at 20:02
what does "$1" mean here?
– kyle
Apr 8 at 0:01
|
show 3 more comments
for stuff in things
do
( something
with
stuff ) &
done
wait # for all the something with stuff
Whether it actually works depends on your commands; I'm not familiar with them. The rm *.mat
looks a bit prone to conflicts if it runs in parallel...
2
This runs perfectly as well. You are right I would have to changerm *.mat
to something likerm $run".mat"
to get it to work without one process interfering with the other. Thank you.
– Ravnoor S Gill
Dec 5 '13 at 21:38
@RavnoorSGill Welcome to Stack Exchange! If this answer solved your problem, please mark it as accepted by ticking the check mark next to it.
– Gilles
Dec 5 '13 at 23:54
5
+1 forwait
, which I forgot.
– goldilocks
Dec 6 '13 at 12:13
3
If there are tons of 'things', won't this start tons of processes? It would be better to start only a sane number of processes simultaneously, right?
– David Doria
Mar 20 '15 at 15:17
@DavidDoria sure, this is meant for small scale. (The example in the question had only three items). I use this style for unlocking a dozen LUKS containers on bootup... if I had a lot more, I'd have to use some other method, but on a small scale this is simple enough.
– frostschutz
Mar 20 '15 at 16:41
add a comment |
for stuff in things
do
sem -j+0 ( something
with
stuff )
done
sem --wait
This will use semaphores, parallelizing as many iterations as the number of available cores (-j +0 means you will parallelize N+0 jobs, where N is the number of available cores).
sem --wait tells to wait until all the iterations in the for loop have terminated execution before executing the successive lines of code.
Note: you will need "parallel" from the GNU parallel project (sudo apt-get install parallel).
1
is it possible to go past 60? mine throws an error saying not enough file descriptors.
– chovy
Nov 27 '15 at 7:47
add a comment |
One really easy way that I often use:
cat "args" | xargs -P $NUM_PARALLEL command
This will run the command, passing in each line of the "args" file, in parallel, running at most $NUM_PARALLEL at the same time.
You can also look into the -I option for xargs, if you need to substitute the input arguments in different places.
add a comment |
It seems the fsl jobs are depending on eachother, so the 4 jobs cannot be run in parallel. The runs, however, can be run in parallel.
Make a bash function running a single run and run that function in parallel:
#!/bin/bash
myfunc() {
run=$1
kar='KAR5'
mkdir normFunc
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}
export -f myfunc
parallel myfunc ::: run2 run3 run4
To learn more watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1 and spend an hour walking through the tutorial http://www.gnu.org/software/parallel/parallel_tutorial.html Your command line will love you for it.
If you're using a non-bash shell you'll need to alsoexport SHELL=/bin/bash
before running parallel. Otherwise you'll get an error like:Unknown command 'myfunc arg'
– AndrewHarvey
Jul 31 '15 at 3:39
1
@AndrewHarvey: isn't that what the shebang is for?
– naught101
Nov 26 '15 at 23:02
add a comment |
Parallel execution in max N-process concurrent
#!/bin/bash
N=4
for i in {a..z}; do
(
# .. do your stuff here
echo "starting task $i.."
sleep $(( (RANDOM % 3) + 1))
) &
# allow only to execute $N jobs in parallel
if [[ $(jobs -r -p | wc -l) -gt $N ]]; then
# wait only for first job
wait -n
fi
done
# wait for pending jobs
wait
echo "all done"
add a comment |
I had trouble with @PSkocik
's solution. My system does not have GNU Parallel available as a package and sem
threw an exception when I built and ran it manually. I then tried the FIFO semaphore example as well which also threw some other errors regarding communication.
@eyeApps
suggested xargs but I didn't know how to make it work with my complex use case (examples would be welcome).
Here is my solution for parallel jobs which process up to N
jobs at a time as configured by _jobs_set_max_parallel
:
_lib_jobs.sh:
function _jobs_get_count_e {
jobs -r | wc -l | tr -d " "
}
function _jobs_set_max_parallel {
g_jobs_max_jobs=$1
}
function _jobs_get_max_parallel_e {
[[ $g_jobs_max_jobs ]] && {
echo $g_jobs_max_jobs
echo 0
}
echo 1
}
function _jobs_is_parallel_available_r() {
(( $(_jobs_get_count_e) < $g_jobs_max_jobs )) &&
return 0
return 1
}
function _jobs_wait_parallel() {
# Sleep between available jobs
while true; do
_jobs_is_parallel_available_r &&
break
sleep 0.1s
done
}
function _jobs_wait() {
wait
}
Example usage:
#!/bin/bash
source "_lib_jobs.sh"
_jobs_set_max_parallel 3
# Run 10 jobs in parallel with varying amounts of work
for a in {1..10}; do
_jobs_wait_parallel
# Sleep between 1-2 seconds to simulate busy work
sleep_delay=$(echo "scale=1; $(shuf -i 10-20 -n 1)/10" | bc -l)
( ### ASYNC
echo $a
sleep ${sleep_delay}s
) &
done
# Visualize jobs
while true; do
n_jobs=$(_jobs_get_count_e)
[[ $n_jobs = 0 ]] &&
break
sleep 0.1s
done
add a comment |
In my case, I can't use semaphore (I'm in git-bash on Windows), so I came up with a generic way to split the task among N workers, before they begin.
It works well if the tasks take roughly the same amount of time. The disadvantage is that, if one of the workers takes a long time to do its part of the job, the others that already finished won't help.
Splitting the job among N workers (1 per core)
# array of assets, assuming at least 1 item exists
listAssets=( {a..z} ) # example: a b c d .. z
# listAssets=( ~/"path with spaces/"*.txt ) # could be file paths
# replace with your task
task() { # $1 = idWorker, $2 = asset
echo "Worker $1: Asset '$2' START!"
# simulating a task that randomly takes 3-6 seconds
sleep $(( ($RANDOM % 4) + 3 ))
echo " Worker $1: Asset '$2' OK!"
}
nVirtualCores=$(nproc --all)
nWorkers=$(( $nVirtualCores * 1 )) # I want 1 process per core
worker() { # $1 = idWorker
echo "Worker $1 GO!"
idAsset=0
for asset in "${listAssets[@]}"; do
# split assets among workers (using modulo); each worker will go through
# the list and select the asset only if it belongs to that worker
(( idAsset % nWorkers == $1 )) && task $1 "$asset"
(( idAsset++ ))
done
echo " Worker $1 ALL DONE!"
}
for (( idWorker=0; idWorker<nWorkers; idWorker++ )); do
# start workers in parallel, use 1 process for each
worker $idWorker &
done
wait # until all workers are done
add a comment |
I really like the answer from @lev as it provides control over the maximum number of processes in a very simple manner. However as described in the manual, sem does not work with brackets.
for stuff in things
do
sem -j +0 "something;
with;
stuff"
done
sem --wait
Does the job.
-j +N Add N to the number of CPU cores. Run up to this many jobs in parallel. For compute intensive jobs -j +0 is useful as it will run number-of-cpu-cores jobs simultaneously.
-j -N Subtract N from the number of CPU cores. Run up to this many jobs in parallel. If the evaluated number is less than 1 then 1 will be used. See also --use-cpus-instead-of-cores.
add a comment |
protected by Kusalananda Dec 17 at 10:34
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
10 Answers
10
active
oldest
votes
10 Answers
10
active
oldest
votes
active
oldest
votes
active
oldest
votes
Why don't you just fork (aka. background) them?
foo () {
local run=$1
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}
for run in $runList; do foo "$run" & done
In case that's not clear, the significant part is here:
for run in $runList; do foo "$run" & done
^
Causing the function to be executed in a forked shell in the background. That's parallel.
5
That worked like a charm. Thank you. Such a simple implementation (Makes me feel so stupid now!).
– Ravnoor S Gill
Dec 5 '13 at 21:24
7
In case I had 8 files to run in parallel but only 4 cores, could that be integrated in such a setting or would that require a Job Scheduler?
– Ravnoor S Gill
Dec 5 '13 at 21:27
5
It doesn't really matter in this context; it's normal for the system to have more active processes than cores. If you have many short tasks, ideally you would feed a queue serviced by a number or worker threads < the number of cores. I don't know how often that is really done with shell scripting (in which case, they wouldn't be threads, they'd be independent processes) but with relatively few long tasks it would be pointless. The OS scheduler will take care of them.
– goldilocks
Dec 5 '13 at 21:50
12
You also might want to add await
command at the end so the master script does not exit until all of the background jobs do.
– psusi
Nov 19 '15 at 0:22
1
I would also fine it useful to limit the number of concurrent processes: my processes each use 100% of a core's time for about 25 minutes. This is on a shared server with 16 cores, where many people are running jobs. I need to run 23 copies of the script. If I run them all concurrently, then I swamp the server, and make it useless for everyone else for an hour or two (load goes up to 30, everything else slows way down). I guess it could be done withnice
, but then I don't know if it'd ever finish..
– naught101
Nov 26 '15 at 23:00
|
show 3 more comments
Why don't you just fork (aka. background) them?
foo () {
local run=$1
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}
for run in $runList; do foo "$run" & done
In case that's not clear, the significant part is here:
for run in $runList; do foo "$run" & done
^
Causing the function to be executed in a forked shell in the background. That's parallel.
5
That worked like a charm. Thank you. Such a simple implementation (Makes me feel so stupid now!).
– Ravnoor S Gill
Dec 5 '13 at 21:24
7
In case I had 8 files to run in parallel but only 4 cores, could that be integrated in such a setting or would that require a Job Scheduler?
– Ravnoor S Gill
Dec 5 '13 at 21:27
5
It doesn't really matter in this context; it's normal for the system to have more active processes than cores. If you have many short tasks, ideally you would feed a queue serviced by a number or worker threads < the number of cores. I don't know how often that is really done with shell scripting (in which case, they wouldn't be threads, they'd be independent processes) but with relatively few long tasks it would be pointless. The OS scheduler will take care of them.
– goldilocks
Dec 5 '13 at 21:50
12
You also might want to add await
command at the end so the master script does not exit until all of the background jobs do.
– psusi
Nov 19 '15 at 0:22
1
I would also fine it useful to limit the number of concurrent processes: my processes each use 100% of a core's time for about 25 minutes. This is on a shared server with 16 cores, where many people are running jobs. I need to run 23 copies of the script. If I run them all concurrently, then I swamp the server, and make it useless for everyone else for an hour or two (load goes up to 30, everything else slows way down). I guess it could be done withnice
, but then I don't know if it'd ever finish..
– naught101
Nov 26 '15 at 23:00
|
show 3 more comments
Why don't you just fork (aka. background) them?
foo () {
local run=$1
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}
for run in $runList; do foo "$run" & done
In case that's not clear, the significant part is here:
for run in $runList; do foo "$run" & done
^
Causing the function to be executed in a forked shell in the background. That's parallel.
Why don't you just fork (aka. background) them?
foo () {
local run=$1
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}
for run in $runList; do foo "$run" & done
In case that's not clear, the significant part is here:
for run in $runList; do foo "$run" & done
^
Causing the function to be executed in a forked shell in the background. That's parallel.
edited Dec 5 '13 at 21:18
jordanm
30.2k28292
30.2k28292
answered Dec 5 '13 at 21:11
goldilocks
61.5k13151207
61.5k13151207
5
That worked like a charm. Thank you. Such a simple implementation (Makes me feel so stupid now!).
– Ravnoor S Gill
Dec 5 '13 at 21:24
7
In case I had 8 files to run in parallel but only 4 cores, could that be integrated in such a setting or would that require a Job Scheduler?
– Ravnoor S Gill
Dec 5 '13 at 21:27
5
It doesn't really matter in this context; it's normal for the system to have more active processes than cores. If you have many short tasks, ideally you would feed a queue serviced by a number or worker threads < the number of cores. I don't know how often that is really done with shell scripting (in which case, they wouldn't be threads, they'd be independent processes) but with relatively few long tasks it would be pointless. The OS scheduler will take care of them.
– goldilocks
Dec 5 '13 at 21:50
12
You also might want to add await
command at the end so the master script does not exit until all of the background jobs do.
– psusi
Nov 19 '15 at 0:22
1
I would also fine it useful to limit the number of concurrent processes: my processes each use 100% of a core's time for about 25 minutes. This is on a shared server with 16 cores, where many people are running jobs. I need to run 23 copies of the script. If I run them all concurrently, then I swamp the server, and make it useless for everyone else for an hour or two (load goes up to 30, everything else slows way down). I guess it could be done withnice
, but then I don't know if it'd ever finish..
– naught101
Nov 26 '15 at 23:00
|
show 3 more comments
5
That worked like a charm. Thank you. Such a simple implementation (Makes me feel so stupid now!).
– Ravnoor S Gill
Dec 5 '13 at 21:24
7
In case I had 8 files to run in parallel but only 4 cores, could that be integrated in such a setting or would that require a Job Scheduler?
– Ravnoor S Gill
Dec 5 '13 at 21:27
5
It doesn't really matter in this context; it's normal for the system to have more active processes than cores. If you have many short tasks, ideally you would feed a queue serviced by a number or worker threads < the number of cores. I don't know how often that is really done with shell scripting (in which case, they wouldn't be threads, they'd be independent processes) but with relatively few long tasks it would be pointless. The OS scheduler will take care of them.
– goldilocks
Dec 5 '13 at 21:50
12
You also might want to add await
command at the end so the master script does not exit until all of the background jobs do.
– psusi
Nov 19 '15 at 0:22
1
I would also fine it useful to limit the number of concurrent processes: my processes each use 100% of a core's time for about 25 minutes. This is on a shared server with 16 cores, where many people are running jobs. I need to run 23 copies of the script. If I run them all concurrently, then I swamp the server, and make it useless for everyone else for an hour or two (load goes up to 30, everything else slows way down). I guess it could be done withnice
, but then I don't know if it'd ever finish..
– naught101
Nov 26 '15 at 23:00
5
5
That worked like a charm. Thank you. Such a simple implementation (Makes me feel so stupid now!).
– Ravnoor S Gill
Dec 5 '13 at 21:24
That worked like a charm. Thank you. Such a simple implementation (Makes me feel so stupid now!).
– Ravnoor S Gill
Dec 5 '13 at 21:24
7
7
In case I had 8 files to run in parallel but only 4 cores, could that be integrated in such a setting or would that require a Job Scheduler?
– Ravnoor S Gill
Dec 5 '13 at 21:27
In case I had 8 files to run in parallel but only 4 cores, could that be integrated in such a setting or would that require a Job Scheduler?
– Ravnoor S Gill
Dec 5 '13 at 21:27
5
5
It doesn't really matter in this context; it's normal for the system to have more active processes than cores. If you have many short tasks, ideally you would feed a queue serviced by a number or worker threads < the number of cores. I don't know how often that is really done with shell scripting (in which case, they wouldn't be threads, they'd be independent processes) but with relatively few long tasks it would be pointless. The OS scheduler will take care of them.
– goldilocks
Dec 5 '13 at 21:50
It doesn't really matter in this context; it's normal for the system to have more active processes than cores. If you have many short tasks, ideally you would feed a queue serviced by a number or worker threads < the number of cores. I don't know how often that is really done with shell scripting (in which case, they wouldn't be threads, they'd be independent processes) but with relatively few long tasks it would be pointless. The OS scheduler will take care of them.
– goldilocks
Dec 5 '13 at 21:50
12
12
You also might want to add a
wait
command at the end so the master script does not exit until all of the background jobs do.– psusi
Nov 19 '15 at 0:22
You also might want to add a
wait
command at the end so the master script does not exit until all of the background jobs do.– psusi
Nov 19 '15 at 0:22
1
1
I would also fine it useful to limit the number of concurrent processes: my processes each use 100% of a core's time for about 25 minutes. This is on a shared server with 16 cores, where many people are running jobs. I need to run 23 copies of the script. If I run them all concurrently, then I swamp the server, and make it useless for everyone else for an hour or two (load goes up to 30, everything else slows way down). I guess it could be done with
nice
, but then I don't know if it'd ever finish..– naught101
Nov 26 '15 at 23:00
I would also fine it useful to limit the number of concurrent processes: my processes each use 100% of a core's time for about 25 minutes. This is on a shared server with 16 cores, where many people are running jobs. I need to run 23 copies of the script. If I run them all concurrently, then I swamp the server, and make it useless for everyone else for an hour or two (load goes up to 30, everything else slows way down). I guess it could be done with
nice
, but then I don't know if it'd ever finish..– naught101
Nov 26 '15 at 23:00
|
show 3 more comments
Sample task
task(){
sleep 0.5; echo "$1";
}
Sequential runs
for thing in a b c d e f g; do
task "$thing"
done
Parallel runs
for thing in a b c d e f g; do
task "$thing" &
done
Parallel runs in N-process batches
N=4
(
for thing in a b c d e f g; do
((i=i%N)); ((i++==0)) && wait
task "$thing" &
done
)
It's also possible to use FIFOs as semaphores and use them to ensure that new processes are spawned as soon as possible and that no more than N processes runs at the same time. But it requires more code.
N processes with a FIFO-based semaphore:
open_sem(){
mkfifo pipe-$$
exec 3<>pipe-$$
rm pipe-$$
local i=$1
for((;i>0;i--)); do
printf %s 000 >&3
done
}
run_with_lock(){
local x
read -u 3 -n 3 x && ((0==x)) || exit $x
(
( "$@"; )
printf '%.3d' $? >&3
)&
}
N=4
open_sem $N
for thing in {a..g}; do
run_with_lock task $thing
done
3
The line withwait
in it basically lets all processes run, until it hits thenth
process, then waits for all of the others to finish running, is that right?
– naught101
Nov 26 '15 at 23:03
Ifi
is zero, call wait. Incrementi
after the zero test.
– PSkocik
Nov 26 '15 at 23:08
1
Love the n parallel runs! Thank you.
– joshperry
Sep 15 '16 at 16:31
1
@naught101 Yes.wait
w/ no arg waits for all children. That makes it a little wasteful. The pipe-based-semaphore approach gives you more fluent concurrency (I've been using that in a custom shell based build system along with-nt
/-ot
checks successfully for a while now)
– PSkocik
Mar 10 at 20:02
what does "$1" mean here?
– kyle
Apr 8 at 0:01
|
show 3 more comments
Sample task
task(){
sleep 0.5; echo "$1";
}
Sequential runs
for thing in a b c d e f g; do
task "$thing"
done
Parallel runs
for thing in a b c d e f g; do
task "$thing" &
done
Parallel runs in N-process batches
N=4
(
for thing in a b c d e f g; do
((i=i%N)); ((i++==0)) && wait
task "$thing" &
done
)
It's also possible to use FIFOs as semaphores and use them to ensure that new processes are spawned as soon as possible and that no more than N processes runs at the same time. But it requires more code.
N processes with a FIFO-based semaphore:
open_sem(){
mkfifo pipe-$$
exec 3<>pipe-$$
rm pipe-$$
local i=$1
for((;i>0;i--)); do
printf %s 000 >&3
done
}
run_with_lock(){
local x
read -u 3 -n 3 x && ((0==x)) || exit $x
(
( "$@"; )
printf '%.3d' $? >&3
)&
}
N=4
open_sem $N
for thing in {a..g}; do
run_with_lock task $thing
done
3
The line withwait
in it basically lets all processes run, until it hits thenth
process, then waits for all of the others to finish running, is that right?
– naught101
Nov 26 '15 at 23:03
Ifi
is zero, call wait. Incrementi
after the zero test.
– PSkocik
Nov 26 '15 at 23:08
1
Love the n parallel runs! Thank you.
– joshperry
Sep 15 '16 at 16:31
1
@naught101 Yes.wait
w/ no arg waits for all children. That makes it a little wasteful. The pipe-based-semaphore approach gives you more fluent concurrency (I've been using that in a custom shell based build system along with-nt
/-ot
checks successfully for a while now)
– PSkocik
Mar 10 at 20:02
what does "$1" mean here?
– kyle
Apr 8 at 0:01
|
show 3 more comments
Sample task
task(){
sleep 0.5; echo "$1";
}
Sequential runs
for thing in a b c d e f g; do
task "$thing"
done
Parallel runs
for thing in a b c d e f g; do
task "$thing" &
done
Parallel runs in N-process batches
N=4
(
for thing in a b c d e f g; do
((i=i%N)); ((i++==0)) && wait
task "$thing" &
done
)
It's also possible to use FIFOs as semaphores and use them to ensure that new processes are spawned as soon as possible and that no more than N processes runs at the same time. But it requires more code.
N processes with a FIFO-based semaphore:
open_sem(){
mkfifo pipe-$$
exec 3<>pipe-$$
rm pipe-$$
local i=$1
for((;i>0;i--)); do
printf %s 000 >&3
done
}
run_with_lock(){
local x
read -u 3 -n 3 x && ((0==x)) || exit $x
(
( "$@"; )
printf '%.3d' $? >&3
)&
}
N=4
open_sem $N
for thing in {a..g}; do
run_with_lock task $thing
done
Sample task
task(){
sleep 0.5; echo "$1";
}
Sequential runs
for thing in a b c d e f g; do
task "$thing"
done
Parallel runs
for thing in a b c d e f g; do
task "$thing" &
done
Parallel runs in N-process batches
N=4
(
for thing in a b c d e f g; do
((i=i%N)); ((i++==0)) && wait
task "$thing" &
done
)
It's also possible to use FIFOs as semaphores and use them to ensure that new processes are spawned as soon as possible and that no more than N processes runs at the same time. But it requires more code.
N processes with a FIFO-based semaphore:
open_sem(){
mkfifo pipe-$$
exec 3<>pipe-$$
rm pipe-$$
local i=$1
for((;i>0;i--)); do
printf %s 000 >&3
done
}
run_with_lock(){
local x
read -u 3 -n 3 x && ((0==x)) || exit $x
(
( "$@"; )
printf '%.3d' $? >&3
)&
}
N=4
open_sem $N
for thing in {a..g}; do
run_with_lock task $thing
done
edited Dec 17 at 10:28
answered Jul 16 '15 at 14:05
PSkocik
17.7k44994
17.7k44994
3
The line withwait
in it basically lets all processes run, until it hits thenth
process, then waits for all of the others to finish running, is that right?
– naught101
Nov 26 '15 at 23:03
Ifi
is zero, call wait. Incrementi
after the zero test.
– PSkocik
Nov 26 '15 at 23:08
1
Love the n parallel runs! Thank you.
– joshperry
Sep 15 '16 at 16:31
1
@naught101 Yes.wait
w/ no arg waits for all children. That makes it a little wasteful. The pipe-based-semaphore approach gives you more fluent concurrency (I've been using that in a custom shell based build system along with-nt
/-ot
checks successfully for a while now)
– PSkocik
Mar 10 at 20:02
what does "$1" mean here?
– kyle
Apr 8 at 0:01
|
show 3 more comments
3
The line withwait
in it basically lets all processes run, until it hits thenth
process, then waits for all of the others to finish running, is that right?
– naught101
Nov 26 '15 at 23:03
Ifi
is zero, call wait. Incrementi
after the zero test.
– PSkocik
Nov 26 '15 at 23:08
1
Love the n parallel runs! Thank you.
– joshperry
Sep 15 '16 at 16:31
1
@naught101 Yes.wait
w/ no arg waits for all children. That makes it a little wasteful. The pipe-based-semaphore approach gives you more fluent concurrency (I've been using that in a custom shell based build system along with-nt
/-ot
checks successfully for a while now)
– PSkocik
Mar 10 at 20:02
what does "$1" mean here?
– kyle
Apr 8 at 0:01
3
3
The line with
wait
in it basically lets all processes run, until it hits the nth
process, then waits for all of the others to finish running, is that right?– naught101
Nov 26 '15 at 23:03
The line with
wait
in it basically lets all processes run, until it hits the nth
process, then waits for all of the others to finish running, is that right?– naught101
Nov 26 '15 at 23:03
If
i
is zero, call wait. Increment i
after the zero test.– PSkocik
Nov 26 '15 at 23:08
If
i
is zero, call wait. Increment i
after the zero test.– PSkocik
Nov 26 '15 at 23:08
1
1
Love the n parallel runs! Thank you.
– joshperry
Sep 15 '16 at 16:31
Love the n parallel runs! Thank you.
– joshperry
Sep 15 '16 at 16:31
1
1
@naught101 Yes.
wait
w/ no arg waits for all children. That makes it a little wasteful. The pipe-based-semaphore approach gives you more fluent concurrency (I've been using that in a custom shell based build system along with -nt
/-ot
checks successfully for a while now)– PSkocik
Mar 10 at 20:02
@naught101 Yes.
wait
w/ no arg waits for all children. That makes it a little wasteful. The pipe-based-semaphore approach gives you more fluent concurrency (I've been using that in a custom shell based build system along with -nt
/-ot
checks successfully for a while now)– PSkocik
Mar 10 at 20:02
what does "$1" mean here?
– kyle
Apr 8 at 0:01
what does "$1" mean here?
– kyle
Apr 8 at 0:01
|
show 3 more comments
for stuff in things
do
( something
with
stuff ) &
done
wait # for all the something with stuff
Whether it actually works depends on your commands; I'm not familiar with them. The rm *.mat
looks a bit prone to conflicts if it runs in parallel...
2
This runs perfectly as well. You are right I would have to changerm *.mat
to something likerm $run".mat"
to get it to work without one process interfering with the other. Thank you.
– Ravnoor S Gill
Dec 5 '13 at 21:38
@RavnoorSGill Welcome to Stack Exchange! If this answer solved your problem, please mark it as accepted by ticking the check mark next to it.
– Gilles
Dec 5 '13 at 23:54
5
+1 forwait
, which I forgot.
– goldilocks
Dec 6 '13 at 12:13
3
If there are tons of 'things', won't this start tons of processes? It would be better to start only a sane number of processes simultaneously, right?
– David Doria
Mar 20 '15 at 15:17
@DavidDoria sure, this is meant for small scale. (The example in the question had only three items). I use this style for unlocking a dozen LUKS containers on bootup... if I had a lot more, I'd have to use some other method, but on a small scale this is simple enough.
– frostschutz
Mar 20 '15 at 16:41
add a comment |
for stuff in things
do
( something
with
stuff ) &
done
wait # for all the something with stuff
Whether it actually works depends on your commands; I'm not familiar with them. The rm *.mat
looks a bit prone to conflicts if it runs in parallel...
2
This runs perfectly as well. You are right I would have to changerm *.mat
to something likerm $run".mat"
to get it to work without one process interfering with the other. Thank you.
– Ravnoor S Gill
Dec 5 '13 at 21:38
@RavnoorSGill Welcome to Stack Exchange! If this answer solved your problem, please mark it as accepted by ticking the check mark next to it.
– Gilles
Dec 5 '13 at 23:54
5
+1 forwait
, which I forgot.
– goldilocks
Dec 6 '13 at 12:13
3
If there are tons of 'things', won't this start tons of processes? It would be better to start only a sane number of processes simultaneously, right?
– David Doria
Mar 20 '15 at 15:17
@DavidDoria sure, this is meant for small scale. (The example in the question had only three items). I use this style for unlocking a dozen LUKS containers on bootup... if I had a lot more, I'd have to use some other method, but on a small scale this is simple enough.
– frostschutz
Mar 20 '15 at 16:41
add a comment |
for stuff in things
do
( something
with
stuff ) &
done
wait # for all the something with stuff
Whether it actually works depends on your commands; I'm not familiar with them. The rm *.mat
looks a bit prone to conflicts if it runs in parallel...
for stuff in things
do
( something
with
stuff ) &
done
wait # for all the something with stuff
Whether it actually works depends on your commands; I'm not familiar with them. The rm *.mat
looks a bit prone to conflicts if it runs in parallel...
edited Dec 5 '13 at 21:24
answered Dec 5 '13 at 21:10
frostschutz
25.8k15280
25.8k15280
2
This runs perfectly as well. You are right I would have to changerm *.mat
to something likerm $run".mat"
to get it to work without one process interfering with the other. Thank you.
– Ravnoor S Gill
Dec 5 '13 at 21:38
@RavnoorSGill Welcome to Stack Exchange! If this answer solved your problem, please mark it as accepted by ticking the check mark next to it.
– Gilles
Dec 5 '13 at 23:54
5
+1 forwait
, which I forgot.
– goldilocks
Dec 6 '13 at 12:13
3
If there are tons of 'things', won't this start tons of processes? It would be better to start only a sane number of processes simultaneously, right?
– David Doria
Mar 20 '15 at 15:17
@DavidDoria sure, this is meant for small scale. (The example in the question had only three items). I use this style for unlocking a dozen LUKS containers on bootup... if I had a lot more, I'd have to use some other method, but on a small scale this is simple enough.
– frostschutz
Mar 20 '15 at 16:41
add a comment |
2
This runs perfectly as well. You are right I would have to changerm *.mat
to something likerm $run".mat"
to get it to work without one process interfering with the other. Thank you.
– Ravnoor S Gill
Dec 5 '13 at 21:38
@RavnoorSGill Welcome to Stack Exchange! If this answer solved your problem, please mark it as accepted by ticking the check mark next to it.
– Gilles
Dec 5 '13 at 23:54
5
+1 forwait
, which I forgot.
– goldilocks
Dec 6 '13 at 12:13
3
If there are tons of 'things', won't this start tons of processes? It would be better to start only a sane number of processes simultaneously, right?
– David Doria
Mar 20 '15 at 15:17
@DavidDoria sure, this is meant for small scale. (The example in the question had only three items). I use this style for unlocking a dozen LUKS containers on bootup... if I had a lot more, I'd have to use some other method, but on a small scale this is simple enough.
– frostschutz
Mar 20 '15 at 16:41
2
2
This runs perfectly as well. You are right I would have to change
rm *.mat
to something like rm $run".mat"
to get it to work without one process interfering with the other. Thank you.– Ravnoor S Gill
Dec 5 '13 at 21:38
This runs perfectly as well. You are right I would have to change
rm *.mat
to something like rm $run".mat"
to get it to work without one process interfering with the other. Thank you.– Ravnoor S Gill
Dec 5 '13 at 21:38
@RavnoorSGill Welcome to Stack Exchange! If this answer solved your problem, please mark it as accepted by ticking the check mark next to it.
– Gilles
Dec 5 '13 at 23:54
@RavnoorSGill Welcome to Stack Exchange! If this answer solved your problem, please mark it as accepted by ticking the check mark next to it.
– Gilles
Dec 5 '13 at 23:54
5
5
+1 for
wait
, which I forgot.– goldilocks
Dec 6 '13 at 12:13
+1 for
wait
, which I forgot.– goldilocks
Dec 6 '13 at 12:13
3
3
If there are tons of 'things', won't this start tons of processes? It would be better to start only a sane number of processes simultaneously, right?
– David Doria
Mar 20 '15 at 15:17
If there are tons of 'things', won't this start tons of processes? It would be better to start only a sane number of processes simultaneously, right?
– David Doria
Mar 20 '15 at 15:17
@DavidDoria sure, this is meant for small scale. (The example in the question had only three items). I use this style for unlocking a dozen LUKS containers on bootup... if I had a lot more, I'd have to use some other method, but on a small scale this is simple enough.
– frostschutz
Mar 20 '15 at 16:41
@DavidDoria sure, this is meant for small scale. (The example in the question had only three items). I use this style for unlocking a dozen LUKS containers on bootup... if I had a lot more, I'd have to use some other method, but on a small scale this is simple enough.
– frostschutz
Mar 20 '15 at 16:41
add a comment |
for stuff in things
do
sem -j+0 ( something
with
stuff )
done
sem --wait
This will use semaphores, parallelizing as many iterations as the number of available cores (-j +0 means you will parallelize N+0 jobs, where N is the number of available cores).
sem --wait tells to wait until all the iterations in the for loop have terminated execution before executing the successive lines of code.
Note: you will need "parallel" from the GNU parallel project (sudo apt-get install parallel).
1
is it possible to go past 60? mine throws an error saying not enough file descriptors.
– chovy
Nov 27 '15 at 7:47
add a comment |
for stuff in things
do
sem -j+0 ( something
with
stuff )
done
sem --wait
This will use semaphores, parallelizing as many iterations as the number of available cores (-j +0 means you will parallelize N+0 jobs, where N is the number of available cores).
sem --wait tells to wait until all the iterations in the for loop have terminated execution before executing the successive lines of code.
Note: you will need "parallel" from the GNU parallel project (sudo apt-get install parallel).
1
is it possible to go past 60? mine throws an error saying not enough file descriptors.
– chovy
Nov 27 '15 at 7:47
add a comment |
for stuff in things
do
sem -j+0 ( something
with
stuff )
done
sem --wait
This will use semaphores, parallelizing as many iterations as the number of available cores (-j +0 means you will parallelize N+0 jobs, where N is the number of available cores).
sem --wait tells to wait until all the iterations in the for loop have terminated execution before executing the successive lines of code.
Note: you will need "parallel" from the GNU parallel project (sudo apt-get install parallel).
for stuff in things
do
sem -j+0 ( something
with
stuff )
done
sem --wait
This will use semaphores, parallelizing as many iterations as the number of available cores (-j +0 means you will parallelize N+0 jobs, where N is the number of available cores).
sem --wait tells to wait until all the iterations in the for loop have terminated execution before executing the successive lines of code.
Note: you will need "parallel" from the GNU parallel project (sudo apt-get install parallel).
edited Feb 9 '17 at 22:40
Jonas Stein
1,12621135
1,12621135
answered Jul 16 '15 at 13:34
lev
38125
38125
1
is it possible to go past 60? mine throws an error saying not enough file descriptors.
– chovy
Nov 27 '15 at 7:47
add a comment |
1
is it possible to go past 60? mine throws an error saying not enough file descriptors.
– chovy
Nov 27 '15 at 7:47
1
1
is it possible to go past 60? mine throws an error saying not enough file descriptors.
– chovy
Nov 27 '15 at 7:47
is it possible to go past 60? mine throws an error saying not enough file descriptors.
– chovy
Nov 27 '15 at 7:47
add a comment |
One really easy way that I often use:
cat "args" | xargs -P $NUM_PARALLEL command
This will run the command, passing in each line of the "args" file, in parallel, running at most $NUM_PARALLEL at the same time.
You can also look into the -I option for xargs, if you need to substitute the input arguments in different places.
add a comment |
One really easy way that I often use:
cat "args" | xargs -P $NUM_PARALLEL command
This will run the command, passing in each line of the "args" file, in parallel, running at most $NUM_PARALLEL at the same time.
You can also look into the -I option for xargs, if you need to substitute the input arguments in different places.
add a comment |
One really easy way that I often use:
cat "args" | xargs -P $NUM_PARALLEL command
This will run the command, passing in each line of the "args" file, in parallel, running at most $NUM_PARALLEL at the same time.
You can also look into the -I option for xargs, if you need to substitute the input arguments in different places.
One really easy way that I often use:
cat "args" | xargs -P $NUM_PARALLEL command
This will run the command, passing in each line of the "args" file, in parallel, running at most $NUM_PARALLEL at the same time.
You can also look into the -I option for xargs, if you need to substitute the input arguments in different places.
answered Jan 28 '17 at 7:05
eyeApps LLC
17614
17614
add a comment |
add a comment |
It seems the fsl jobs are depending on eachother, so the 4 jobs cannot be run in parallel. The runs, however, can be run in parallel.
Make a bash function running a single run and run that function in parallel:
#!/bin/bash
myfunc() {
run=$1
kar='KAR5'
mkdir normFunc
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}
export -f myfunc
parallel myfunc ::: run2 run3 run4
To learn more watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1 and spend an hour walking through the tutorial http://www.gnu.org/software/parallel/parallel_tutorial.html Your command line will love you for it.
If you're using a non-bash shell you'll need to alsoexport SHELL=/bin/bash
before running parallel. Otherwise you'll get an error like:Unknown command 'myfunc arg'
– AndrewHarvey
Jul 31 '15 at 3:39
1
@AndrewHarvey: isn't that what the shebang is for?
– naught101
Nov 26 '15 at 23:02
add a comment |
It seems the fsl jobs are depending on eachother, so the 4 jobs cannot be run in parallel. The runs, however, can be run in parallel.
Make a bash function running a single run and run that function in parallel:
#!/bin/bash
myfunc() {
run=$1
kar='KAR5'
mkdir normFunc
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}
export -f myfunc
parallel myfunc ::: run2 run3 run4
To learn more watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1 and spend an hour walking through the tutorial http://www.gnu.org/software/parallel/parallel_tutorial.html Your command line will love you for it.
If you're using a non-bash shell you'll need to alsoexport SHELL=/bin/bash
before running parallel. Otherwise you'll get an error like:Unknown command 'myfunc arg'
– AndrewHarvey
Jul 31 '15 at 3:39
1
@AndrewHarvey: isn't that what the shebang is for?
– naught101
Nov 26 '15 at 23:02
add a comment |
It seems the fsl jobs are depending on eachother, so the 4 jobs cannot be run in parallel. The runs, however, can be run in parallel.
Make a bash function running a single run and run that function in parallel:
#!/bin/bash
myfunc() {
run=$1
kar='KAR5'
mkdir normFunc
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}
export -f myfunc
parallel myfunc ::: run2 run3 run4
To learn more watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1 and spend an hour walking through the tutorial http://www.gnu.org/software/parallel/parallel_tutorial.html Your command line will love you for it.
It seems the fsl jobs are depending on eachother, so the 4 jobs cannot be run in parallel. The runs, however, can be run in parallel.
Make a bash function running a single run and run that function in parallel:
#!/bin/bash
myfunc() {
run=$1
kar='KAR5'
mkdir normFunc
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}
export -f myfunc
parallel myfunc ::: run2 run3 run4
To learn more watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1 and spend an hour walking through the tutorial http://www.gnu.org/software/parallel/parallel_tutorial.html Your command line will love you for it.
answered Dec 6 '13 at 10:16
Ole Tange
12k1451105
12k1451105
If you're using a non-bash shell you'll need to alsoexport SHELL=/bin/bash
before running parallel. Otherwise you'll get an error like:Unknown command 'myfunc arg'
– AndrewHarvey
Jul 31 '15 at 3:39
1
@AndrewHarvey: isn't that what the shebang is for?
– naught101
Nov 26 '15 at 23:02
add a comment |
If you're using a non-bash shell you'll need to alsoexport SHELL=/bin/bash
before running parallel. Otherwise you'll get an error like:Unknown command 'myfunc arg'
– AndrewHarvey
Jul 31 '15 at 3:39
1
@AndrewHarvey: isn't that what the shebang is for?
– naught101
Nov 26 '15 at 23:02
If you're using a non-bash shell you'll need to also
export SHELL=/bin/bash
before running parallel. Otherwise you'll get an error like: Unknown command 'myfunc arg'
– AndrewHarvey
Jul 31 '15 at 3:39
If you're using a non-bash shell you'll need to also
export SHELL=/bin/bash
before running parallel. Otherwise you'll get an error like: Unknown command 'myfunc arg'
– AndrewHarvey
Jul 31 '15 at 3:39
1
1
@AndrewHarvey: isn't that what the shebang is for?
– naught101
Nov 26 '15 at 23:02
@AndrewHarvey: isn't that what the shebang is for?
– naught101
Nov 26 '15 at 23:02
add a comment |
Parallel execution in max N-process concurrent
#!/bin/bash
N=4
for i in {a..z}; do
(
# .. do your stuff here
echo "starting task $i.."
sleep $(( (RANDOM % 3) + 1))
) &
# allow only to execute $N jobs in parallel
if [[ $(jobs -r -p | wc -l) -gt $N ]]; then
# wait only for first job
wait -n
fi
done
# wait for pending jobs
wait
echo "all done"
add a comment |
Parallel execution in max N-process concurrent
#!/bin/bash
N=4
for i in {a..z}; do
(
# .. do your stuff here
echo "starting task $i.."
sleep $(( (RANDOM % 3) + 1))
) &
# allow only to execute $N jobs in parallel
if [[ $(jobs -r -p | wc -l) -gt $N ]]; then
# wait only for first job
wait -n
fi
done
# wait for pending jobs
wait
echo "all done"
add a comment |
Parallel execution in max N-process concurrent
#!/bin/bash
N=4
for i in {a..z}; do
(
# .. do your stuff here
echo "starting task $i.."
sleep $(( (RANDOM % 3) + 1))
) &
# allow only to execute $N jobs in parallel
if [[ $(jobs -r -p | wc -l) -gt $N ]]; then
# wait only for first job
wait -n
fi
done
# wait for pending jobs
wait
echo "all done"
Parallel execution in max N-process concurrent
#!/bin/bash
N=4
for i in {a..z}; do
(
# .. do your stuff here
echo "starting task $i.."
sleep $(( (RANDOM % 3) + 1))
) &
# allow only to execute $N jobs in parallel
if [[ $(jobs -r -p | wc -l) -gt $N ]]; then
# wait only for first job
wait -n
fi
done
# wait for pending jobs
wait
echo "all done"
answered Apr 10 at 8:04
Tomasz Hławiczka
212
212
add a comment |
add a comment |
I had trouble with @PSkocik
's solution. My system does not have GNU Parallel available as a package and sem
threw an exception when I built and ran it manually. I then tried the FIFO semaphore example as well which also threw some other errors regarding communication.
@eyeApps
suggested xargs but I didn't know how to make it work with my complex use case (examples would be welcome).
Here is my solution for parallel jobs which process up to N
jobs at a time as configured by _jobs_set_max_parallel
:
_lib_jobs.sh:
function _jobs_get_count_e {
jobs -r | wc -l | tr -d " "
}
function _jobs_set_max_parallel {
g_jobs_max_jobs=$1
}
function _jobs_get_max_parallel_e {
[[ $g_jobs_max_jobs ]] && {
echo $g_jobs_max_jobs
echo 0
}
echo 1
}
function _jobs_is_parallel_available_r() {
(( $(_jobs_get_count_e) < $g_jobs_max_jobs )) &&
return 0
return 1
}
function _jobs_wait_parallel() {
# Sleep between available jobs
while true; do
_jobs_is_parallel_available_r &&
break
sleep 0.1s
done
}
function _jobs_wait() {
wait
}
Example usage:
#!/bin/bash
source "_lib_jobs.sh"
_jobs_set_max_parallel 3
# Run 10 jobs in parallel with varying amounts of work
for a in {1..10}; do
_jobs_wait_parallel
# Sleep between 1-2 seconds to simulate busy work
sleep_delay=$(echo "scale=1; $(shuf -i 10-20 -n 1)/10" | bc -l)
( ### ASYNC
echo $a
sleep ${sleep_delay}s
) &
done
# Visualize jobs
while true; do
n_jobs=$(_jobs_get_count_e)
[[ $n_jobs = 0 ]] &&
break
sleep 0.1s
done
add a comment |
I had trouble with @PSkocik
's solution. My system does not have GNU Parallel available as a package and sem
threw an exception when I built and ran it manually. I then tried the FIFO semaphore example as well which also threw some other errors regarding communication.
@eyeApps
suggested xargs but I didn't know how to make it work with my complex use case (examples would be welcome).
Here is my solution for parallel jobs which process up to N
jobs at a time as configured by _jobs_set_max_parallel
:
_lib_jobs.sh:
function _jobs_get_count_e {
jobs -r | wc -l | tr -d " "
}
function _jobs_set_max_parallel {
g_jobs_max_jobs=$1
}
function _jobs_get_max_parallel_e {
[[ $g_jobs_max_jobs ]] && {
echo $g_jobs_max_jobs
echo 0
}
echo 1
}
function _jobs_is_parallel_available_r() {
(( $(_jobs_get_count_e) < $g_jobs_max_jobs )) &&
return 0
return 1
}
function _jobs_wait_parallel() {
# Sleep between available jobs
while true; do
_jobs_is_parallel_available_r &&
break
sleep 0.1s
done
}
function _jobs_wait() {
wait
}
Example usage:
#!/bin/bash
source "_lib_jobs.sh"
_jobs_set_max_parallel 3
# Run 10 jobs in parallel with varying amounts of work
for a in {1..10}; do
_jobs_wait_parallel
# Sleep between 1-2 seconds to simulate busy work
sleep_delay=$(echo "scale=1; $(shuf -i 10-20 -n 1)/10" | bc -l)
( ### ASYNC
echo $a
sleep ${sleep_delay}s
) &
done
# Visualize jobs
while true; do
n_jobs=$(_jobs_get_count_e)
[[ $n_jobs = 0 ]] &&
break
sleep 0.1s
done
add a comment |
I had trouble with @PSkocik
's solution. My system does not have GNU Parallel available as a package and sem
threw an exception when I built and ran it manually. I then tried the FIFO semaphore example as well which also threw some other errors regarding communication.
@eyeApps
suggested xargs but I didn't know how to make it work with my complex use case (examples would be welcome).
Here is my solution for parallel jobs which process up to N
jobs at a time as configured by _jobs_set_max_parallel
:
_lib_jobs.sh:
function _jobs_get_count_e {
jobs -r | wc -l | tr -d " "
}
function _jobs_set_max_parallel {
g_jobs_max_jobs=$1
}
function _jobs_get_max_parallel_e {
[[ $g_jobs_max_jobs ]] && {
echo $g_jobs_max_jobs
echo 0
}
echo 1
}
function _jobs_is_parallel_available_r() {
(( $(_jobs_get_count_e) < $g_jobs_max_jobs )) &&
return 0
return 1
}
function _jobs_wait_parallel() {
# Sleep between available jobs
while true; do
_jobs_is_parallel_available_r &&
break
sleep 0.1s
done
}
function _jobs_wait() {
wait
}
Example usage:
#!/bin/bash
source "_lib_jobs.sh"
_jobs_set_max_parallel 3
# Run 10 jobs in parallel with varying amounts of work
for a in {1..10}; do
_jobs_wait_parallel
# Sleep between 1-2 seconds to simulate busy work
sleep_delay=$(echo "scale=1; $(shuf -i 10-20 -n 1)/10" | bc -l)
( ### ASYNC
echo $a
sleep ${sleep_delay}s
) &
done
# Visualize jobs
while true; do
n_jobs=$(_jobs_get_count_e)
[[ $n_jobs = 0 ]] &&
break
sleep 0.1s
done
I had trouble with @PSkocik
's solution. My system does not have GNU Parallel available as a package and sem
threw an exception when I built and ran it manually. I then tried the FIFO semaphore example as well which also threw some other errors regarding communication.
@eyeApps
suggested xargs but I didn't know how to make it work with my complex use case (examples would be welcome).
Here is my solution for parallel jobs which process up to N
jobs at a time as configured by _jobs_set_max_parallel
:
_lib_jobs.sh:
function _jobs_get_count_e {
jobs -r | wc -l | tr -d " "
}
function _jobs_set_max_parallel {
g_jobs_max_jobs=$1
}
function _jobs_get_max_parallel_e {
[[ $g_jobs_max_jobs ]] && {
echo $g_jobs_max_jobs
echo 0
}
echo 1
}
function _jobs_is_parallel_available_r() {
(( $(_jobs_get_count_e) < $g_jobs_max_jobs )) &&
return 0
return 1
}
function _jobs_wait_parallel() {
# Sleep between available jobs
while true; do
_jobs_is_parallel_available_r &&
break
sleep 0.1s
done
}
function _jobs_wait() {
wait
}
Example usage:
#!/bin/bash
source "_lib_jobs.sh"
_jobs_set_max_parallel 3
# Run 10 jobs in parallel with varying amounts of work
for a in {1..10}; do
_jobs_wait_parallel
# Sleep between 1-2 seconds to simulate busy work
sleep_delay=$(echo "scale=1; $(shuf -i 10-20 -n 1)/10" | bc -l)
( ### ASYNC
echo $a
sleep ${sleep_delay}s
) &
done
# Visualize jobs
while true; do
n_jobs=$(_jobs_get_count_e)
[[ $n_jobs = 0 ]] &&
break
sleep 0.1s
done
edited Jun 18 '17 at 4:26
answered Mar 16 '17 at 22:35
Zhro
342413
342413
add a comment |
add a comment |
In my case, I can't use semaphore (I'm in git-bash on Windows), so I came up with a generic way to split the task among N workers, before they begin.
It works well if the tasks take roughly the same amount of time. The disadvantage is that, if one of the workers takes a long time to do its part of the job, the others that already finished won't help.
Splitting the job among N workers (1 per core)
# array of assets, assuming at least 1 item exists
listAssets=( {a..z} ) # example: a b c d .. z
# listAssets=( ~/"path with spaces/"*.txt ) # could be file paths
# replace with your task
task() { # $1 = idWorker, $2 = asset
echo "Worker $1: Asset '$2' START!"
# simulating a task that randomly takes 3-6 seconds
sleep $(( ($RANDOM % 4) + 3 ))
echo " Worker $1: Asset '$2' OK!"
}
nVirtualCores=$(nproc --all)
nWorkers=$(( $nVirtualCores * 1 )) # I want 1 process per core
worker() { # $1 = idWorker
echo "Worker $1 GO!"
idAsset=0
for asset in "${listAssets[@]}"; do
# split assets among workers (using modulo); each worker will go through
# the list and select the asset only if it belongs to that worker
(( idAsset % nWorkers == $1 )) && task $1 "$asset"
(( idAsset++ ))
done
echo " Worker $1 ALL DONE!"
}
for (( idWorker=0; idWorker<nWorkers; idWorker++ )); do
# start workers in parallel, use 1 process for each
worker $idWorker &
done
wait # until all workers are done
add a comment |
In my case, I can't use semaphore (I'm in git-bash on Windows), so I came up with a generic way to split the task among N workers, before they begin.
It works well if the tasks take roughly the same amount of time. The disadvantage is that, if one of the workers takes a long time to do its part of the job, the others that already finished won't help.
Splitting the job among N workers (1 per core)
# array of assets, assuming at least 1 item exists
listAssets=( {a..z} ) # example: a b c d .. z
# listAssets=( ~/"path with spaces/"*.txt ) # could be file paths
# replace with your task
task() { # $1 = idWorker, $2 = asset
echo "Worker $1: Asset '$2' START!"
# simulating a task that randomly takes 3-6 seconds
sleep $(( ($RANDOM % 4) + 3 ))
echo " Worker $1: Asset '$2' OK!"
}
nVirtualCores=$(nproc --all)
nWorkers=$(( $nVirtualCores * 1 )) # I want 1 process per core
worker() { # $1 = idWorker
echo "Worker $1 GO!"
idAsset=0
for asset in "${listAssets[@]}"; do
# split assets among workers (using modulo); each worker will go through
# the list and select the asset only if it belongs to that worker
(( idAsset % nWorkers == $1 )) && task $1 "$asset"
(( idAsset++ ))
done
echo " Worker $1 ALL DONE!"
}
for (( idWorker=0; idWorker<nWorkers; idWorker++ )); do
# start workers in parallel, use 1 process for each
worker $idWorker &
done
wait # until all workers are done
add a comment |
In my case, I can't use semaphore (I'm in git-bash on Windows), so I came up with a generic way to split the task among N workers, before they begin.
It works well if the tasks take roughly the same amount of time. The disadvantage is that, if one of the workers takes a long time to do its part of the job, the others that already finished won't help.
Splitting the job among N workers (1 per core)
# array of assets, assuming at least 1 item exists
listAssets=( {a..z} ) # example: a b c d .. z
# listAssets=( ~/"path with spaces/"*.txt ) # could be file paths
# replace with your task
task() { # $1 = idWorker, $2 = asset
echo "Worker $1: Asset '$2' START!"
# simulating a task that randomly takes 3-6 seconds
sleep $(( ($RANDOM % 4) + 3 ))
echo " Worker $1: Asset '$2' OK!"
}
nVirtualCores=$(nproc --all)
nWorkers=$(( $nVirtualCores * 1 )) # I want 1 process per core
worker() { # $1 = idWorker
echo "Worker $1 GO!"
idAsset=0
for asset in "${listAssets[@]}"; do
# split assets among workers (using modulo); each worker will go through
# the list and select the asset only if it belongs to that worker
(( idAsset % nWorkers == $1 )) && task $1 "$asset"
(( idAsset++ ))
done
echo " Worker $1 ALL DONE!"
}
for (( idWorker=0; idWorker<nWorkers; idWorker++ )); do
# start workers in parallel, use 1 process for each
worker $idWorker &
done
wait # until all workers are done
In my case, I can't use semaphore (I'm in git-bash on Windows), so I came up with a generic way to split the task among N workers, before they begin.
It works well if the tasks take roughly the same amount of time. The disadvantage is that, if one of the workers takes a long time to do its part of the job, the others that already finished won't help.
Splitting the job among N workers (1 per core)
# array of assets, assuming at least 1 item exists
listAssets=( {a..z} ) # example: a b c d .. z
# listAssets=( ~/"path with spaces/"*.txt ) # could be file paths
# replace with your task
task() { # $1 = idWorker, $2 = asset
echo "Worker $1: Asset '$2' START!"
# simulating a task that randomly takes 3-6 seconds
sleep $(( ($RANDOM % 4) + 3 ))
echo " Worker $1: Asset '$2' OK!"
}
nVirtualCores=$(nproc --all)
nWorkers=$(( $nVirtualCores * 1 )) # I want 1 process per core
worker() { # $1 = idWorker
echo "Worker $1 GO!"
idAsset=0
for asset in "${listAssets[@]}"; do
# split assets among workers (using modulo); each worker will go through
# the list and select the asset only if it belongs to that worker
(( idAsset % nWorkers == $1 )) && task $1 "$asset"
(( idAsset++ ))
done
echo " Worker $1 ALL DONE!"
}
for (( idWorker=0; idWorker<nWorkers; idWorker++ )); do
# start workers in parallel, use 1 process for each
worker $idWorker &
done
wait # until all workers are done
answered May 26 at 19:14
geekley
1012
1012
add a comment |
add a comment |
I really like the answer from @lev as it provides control over the maximum number of processes in a very simple manner. However as described in the manual, sem does not work with brackets.
for stuff in things
do
sem -j +0 "something;
with;
stuff"
done
sem --wait
Does the job.
-j +N Add N to the number of CPU cores. Run up to this many jobs in parallel. For compute intensive jobs -j +0 is useful as it will run number-of-cpu-cores jobs simultaneously.
-j -N Subtract N from the number of CPU cores. Run up to this many jobs in parallel. If the evaluated number is less than 1 then 1 will be used. See also --use-cpus-instead-of-cores.
add a comment |
I really like the answer from @lev as it provides control over the maximum number of processes in a very simple manner. However as described in the manual, sem does not work with brackets.
for stuff in things
do
sem -j +0 "something;
with;
stuff"
done
sem --wait
Does the job.
-j +N Add N to the number of CPU cores. Run up to this many jobs in parallel. For compute intensive jobs -j +0 is useful as it will run number-of-cpu-cores jobs simultaneously.
-j -N Subtract N from the number of CPU cores. Run up to this many jobs in parallel. If the evaluated number is less than 1 then 1 will be used. See also --use-cpus-instead-of-cores.
add a comment |
I really like the answer from @lev as it provides control over the maximum number of processes in a very simple manner. However as described in the manual, sem does not work with brackets.
for stuff in things
do
sem -j +0 "something;
with;
stuff"
done
sem --wait
Does the job.
-j +N Add N to the number of CPU cores. Run up to this many jobs in parallel. For compute intensive jobs -j +0 is useful as it will run number-of-cpu-cores jobs simultaneously.
-j -N Subtract N from the number of CPU cores. Run up to this many jobs in parallel. If the evaluated number is less than 1 then 1 will be used. See also --use-cpus-instead-of-cores.
I really like the answer from @lev as it provides control over the maximum number of processes in a very simple manner. However as described in the manual, sem does not work with brackets.
for stuff in things
do
sem -j +0 "something;
with;
stuff"
done
sem --wait
Does the job.
-j +N Add N to the number of CPU cores. Run up to this many jobs in parallel. For compute intensive jobs -j +0 is useful as it will run number-of-cpu-cores jobs simultaneously.
-j -N Subtract N from the number of CPU cores. Run up to this many jobs in parallel. If the evaluated number is less than 1 then 1 will be used. See also --use-cpus-instead-of-cores.
answered Oct 11 at 11:58
moritzschaefer
62
62
add a comment |
add a comment |
protected by Kusalananda Dec 17 at 10:34
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?