Assumption: Assuming your using bash 4. However this will work with bash 3 as im not using anything too special.
So your copying alot of things over to a remote system using rsync (over ssh, or a local rsync transfer) or cp. Lets say the connection is prone to disconnections (or the transfer fails alot for whatever reason), but the situations allows for the transfer to come back up, if you simply resumed the command. In other words: Rsync / cp / or whatever command fail & close with error, so you resume it to restart (assuming you use the correct command line options which allow for resume – see below).
Using this simple bash script formula you can make a command repeat over and over until its successful:
while true; do if (COMMAND); then echo "success" break; else; echo "fail" echo; fi; done;
The COMMAND will execute when it gets to the if line, then the if statement will run when the COMMAND returns an exit code. If the exit code is 0, then the if statement evaluates as true and executes the top part saying “success” and breaking out of the loop with a “bread”. If the COMMAND returned a none zero exit code, then the if statement evaluates as false, and the bottom part (“else” part) executes and it shows “fail” on the screen, then the loop goes back around and run COMMAND again and repeat the whole thing.. This will continue happening until we get a success. NOTE: command will run at least once.
SIDENOTE NOTE on subshells: you can have several COMMANDS in the parenthesis, its the exit status of the last command that matters. Also you can use subshells, the parenthesis () , around the whole (while true; do all the way to the bottom done;). This is useful because any variables created in subshell only last in the subshell and dont mess with your main bash environment (if you modify the script as I have for rsync & cp example below). I also use subshell on the COMMAND (hence the parenthesis around COMMAND), so that you can fit multiple commands (its the last commands exit status that determines what the if statement will do). Note with subshell syntax, its optional to put a semicolon on last command inside the subshell, so both this syntax (COMMAND) and this syntax (COMMAND;) are correct. Also spacing and newlines dont matter. So this ( COMMAND 😉 and this ( COMMAND; ) are correct and also this:
( COMMAND; )
# In below example: sleep 3 returns 0, success, so sleep 3 will only run once and you see one success message while true; do if (sleep 3); then echo "success" break; else; echo "fail" echo; echo; fi; done; # In below example: sleep 3 returns 0, success. However, the exit 123, returns 123, which is not 0 (so not success). So the if statement always evaluates false (not 0) and the bottom part plays out continously - over and over (until you exit it with CONTROL-C or close out of the shell) while true; do if (sleep 3; exit 123;); then echo "success" break; else; echo "fail" echo; echo; fi; done;
Here is an RSYNC over SSH example (of course you can just run an RSYNC to a mount folder, like a samba mount). To the mini script I added somethings to get more valuable information in the output, im also logging the output. I added variable N to store the run number. First run its N=1 (it starts of as N=9, but when while loop starts its incremeted to 1 with N=$((N+1) ), second run its N=2, and extra output when it says success or fail to show the run number and also to show the date date & epoch date date +%s. At the end I send the output of that entire subshell to a log file (along with all of the errors). I can monitor the entire thing with a tail tail -f /var/log/backup-rsync.log . Also note that im running rsync with -P which gives –progress (a visual progress bar in the output per file, it cant do a progress bar for the whole transfer) but also does –partial (good for copies that need to be resumes, as it allows partials files to stay). Its important to select options that allow resumability:
( N=0; while true; do N=$((N+1)); if (rsync -avzP -e "ssh -p 54321" /data/sb/sbrp.tar.7z* firstname.lastname@example.org:/data/server-backup/); then echo "success - run number $N - date `date` `date +%s`" break; else; echo "fail - run number $N - date `date` `date +%s`" echo; sleep 300; fi; done; ) > /var/log/backup-rsync.log 2>&1
NOTE add a wait time if fails: if the command (rsync,whatever) fails, I added in a sleep 300; . 300 seconds = 5 minutes. It goes in the bottom if part (between the else and the fi). That way we give the program sometime to wait for the system and other systems to restablish a working connection, maybe the connection will restablish in that time (maybe not, if not, then it will fail again, and loop back around). Eventually when the connection comes back up it will reconnect. Without a sleep of 300. The rsync command will have its own timeout mechanism (I counted about 127 seconds = 2 minutes). So with a sleep of 300. We add 300 seconds to the 127 seconds. So about a 7 minute wait in between failed rsyncs.
NOTE about authentication with rsync over ssh: with the above specific rsync operation, it will ask for the password over and over (when it has to resume) – so thats annoying – especially since we are redirecting all output to a log file (we wont even know its asking) – we dont want it to ask for anything from the user because the point of script that does something until its succesful, is that you want minimal (to none) user input. . so we have to enter the password manually each time it resumes rsync over ssh [note with a local rsync transfer on the same system, we dont need to specify any type of authnetication]. So the solution to the password problem with rsync over ssh is to get the authentication done correctly. . The above script implies I have properly used ssh keys authentication so it doesnt ask for the password when rsync command comes up (it assumes, I copied the public key ~/.ssh/id_rsa.pub from local system to server.com’s /home/user1/.ssh/authorized keys file, specifically appended that public key to a new line). . Another way to take care of the password authnetication is to use sshpass, which automatically enters the password for you everytime ssh or rsync or other similar ssh programs ask for it(less secure because password is in the command line or file), example if (sshpass -p ‘somepassword’ rsync -avzP -e “ssh -p 54321” /data/sb/sbrp.tar.7z* email@example.com:/data/server-backup/); then . Here is an article that covers the 3 methods (. enter password manually everytime, . use ssh keys . use keepass): http://www.infotinks.com/compressing-and-sending-linux-filesystem-via-tar-and-7z-and-rsync/
You can do the same thing with cp. With cp I use -u because that allows resumability (however it doesn’t resume from the failed file, but the file after next):
( N=0; while true; do N=$((N+1)); if (cp -ruv /data/ /mnt/data/); then echo "success - run number $N - date `date` `date +%s`" break; else; echo "fail - run number $N - date `date` `date +%s`" echo; sleep 300; fi; done; ) > /var/log/backup-copy.log 2>&1
NOTE run rsync after cp (if cp fails): If I run cp and it fails over and over until its succesful, there will be some incomplete files. Because cp doesnt resume on the failed file but on the next file (and cp only has this resume functionality if you use cp with -u, if you dont use -u the copy starts from the beginning of the file list – and with -u cp will resume from the next file, not the one that it failed on, so the file that it failed on is partially transfered). RSYNC on the other hand resumes on the failed file (especially when you use –partial which comes with -P, along with –progress which is unrelated but useful to watch progress – so because of the –partial on rsync, any partially transferred or changed files will get fully transferred). So the solution is to run a similar rsync after the cps, to finish transfering the files that were partially transfered by cp (and didnt get across because cp failed for whatever reason) – then when rsync tries to copy whatever didnt copy out, if that operation fails, upon resume, it will be intelligent as always and resume where it left off from. Here is an article showing how to construct the rsync and cp commands (note the format of the folders, so that you get the correct tree structure with the rsync and cp): http://www.infotinks.com/cp-followed-by-rsync-to-copy-what-was-missed/
NOTE on time it takes to resume: on a system with many files it takes some time for rsync and cp to figure out where they left of, so be patient. I assume rsync resume time is proportional to the size of files & number of files, where as cp resume time is only proportional to the number of files.
NOTE how to kill the task: if you copy pasted the command into a shell. Then you have to kill the parent process. Look for the bash process that has an rsync child or a sleep child (as we have told the process to sleep after rsync fails), so you can either see bash->rsync or bash->sleep. You can do so with ps auxf or ps auxfww also with pstree -p (-p to see the pids). Kill the process ID. Use a regular kill instead of a -9 kill. If you insist on doing a -9 kill then do a regular kill first. A regular kill will allow rsync to properly close off the rsync server on the remote side (although im sure the remote side has measures for dealing with peer side rsyncs being closed off). You can follow up a regular kill with a kill -9. So all in all just do this kill <PID of parent BASH process that has RSYNC or SLEEP>; followed by kill -9 <PID of parent BASH process that has RSYNC or SLEEP>; . If you have made a executable script file out of this articles topic, then you can just kill that running script, followed by kill -9 that script. kill <PID of script>; kill -9 <PID of script>; . One of the benefits of running this in an executable script is that its easier to find out the PID that you need to kill (because you just look for the PID by the scripts filename), however when you copy paste it you need to sift thru all of the bash processes and pick the right bash process to kill (after all you dont want to kill the bash process which holds you main shell). NOTE that if you just kill the rsync process, it will start back up again because thats the nature of this articles script.
EXAMPLE BY STORY
######################################################################## # HERE IS A STORY OF AN RSYNC TRANSFER OF 524 MBPS ACROSS THE INTERNET # ######################################################################## # "This is a true story. The events depicted in this code-snippet took place in California in 2015. At the request of the infotinks, the hostnames,passwords,and ports have been changed. Out of respect for security, the rest has been told exactly as it occurred." # below is an example of how i transfered a system backup of 524 gigs # I had 52 files numbered sbrp.tar.001 thru sbrp.tar.052 each sized 10 gigs # I pasted script number.1. into # at first I transfered without any limits. it was going at 10 MiBPS (it was bottlenecked at the download speed at the remote end, the local end could go faster than 10 MBPS on dwnload and uploade) # after about 10 files finished, and it was in the middle of working on the 11th file (sbrp.tar.011), I decided I wanted the speed to be lower (so that the remote end isnt using up all of its download bandwidth on this, so that the remote end could do other things with speed and honor), so I added --bwlimit=2500 to the command. # i killed the first script (which by the was you can kill by doing "ps ax" and kill -9 the sshpass,rsync,ssh,and bash processes) # it was like 28% done with the 11th file, when I ended it cold in its tracks # after confirming that the first script is no longer running (again "ps ax"), I pasted script #2. # it took it less than 10 seconds to realize that it left of on the 11th file. # it then took about 1 minute to go up to the 28% mark and resume from that point (it was going up to 0 to 28 at 24 MiBPS, which is faster than my bw, thats just how fast it figures out by comparing checksums and metadata - this is why rsync is awesome) # once it got to 28% it started transfering real data and you could see its transfer speed was at a 2.5 MiBps (just as my --bwlimit was set) # POINT OF THE STORY: when ever I want I can kill script number 1. (or script number 2) and start the other one and I know that it will resume all within less than a minutes time. # ---------------------------------------------------- # # script number.1. # # ---------------------------------------------------- # (N=0; while true; do N=$((N+1)) if (sshpass -p 'ZEBRTAS' rsync -avzP -e "ssh -p 2022" /data/sb/sbrp.tar.7z* server.com:/data/Main/nasdatabackup/); then echo "* success - run number $N - date `date` `date +%s`" break; else echo "* fail - run number $N - date `date` `date +%s`" echo; echo; sleep 300; fi; done) > /data/sb/rsync.log 2>&1 & # ---------------------------------------------------- # # script number.2. --bwlimit=2500 (2.5MiBPs) # # ---------------------------------------------------- # (N=0; while true; do N=$((N+1)) if (sshpass -p 'ZEBRTAS' rsync -avzP --bwlimit=2500 -e "ssh -p 2022" /data/sb/sbrp.tar.7z* server.com:/data/Main/nasdatabackup/); then echo "* success - run number $N - date `date` `date +%s`" break; else echo "* fail - run number $N - date `date` `date +%s`" echo; echo; sleep 300; fi; done) > /data/sb/rsync.log 2>&1 &