This is a working program that I use to scour through lots of text for keywords. I put the keywords in the terms file, one line at a time (keywords or key phrases so they can have spaces). Each launches a recursive grep through whatever folder I need to look for those phrases. Best part is each search/grep is launched at same time (so you dont have to wait for one to finish before next begins)
This is based off: this article
The main script which has everything is (it contains the other scripts in it, they are just commented away for redundancy). It needs terms file with it (which you edit to include your terms). The other helper scripts are given as well (also as comments in <folder or file to search thru> that launches the script which makes the result files
Each result file looks like this _allSS_<term looked for>_<folder or file name given>.txt – you get as many result files as lines in terms that had valid terms.
Valid terms files are like:
this is a phrase1 that has term4
phrase2 has a space as well
Then use any of the 3 monitoring scripts to monitor progress:
./ – this shows the processes and also the last few lines of each result file (live as terms/phrases are being found)
./ – this shows the processes and also the number of lines found in each result file (number of times each term was found, also live info)
./ – this just shows the processes information that you get in monitor1 and monitor2 scripts (also it has system load and memory)
When your done you can concat all of your results to one file with ./
Move them to a folder (that will be made) – simple mkdir and mv script: ./ <folder name>
Lets say you launched a whole group of grep commands and they are making result files already, need to clean up? run ./
./print.script just outputs what you see below (completely optional) – it formats output with seperators and gives statistics about each file (word count, line count, etc…)
./backup.script is my own script that I run after I modify any of the source code – makes it easy to bundle everything up


NOTE: program below has every script. its the main script, and it has all of the optional scripts commented out (for redundancy)

NOTE: added (and nice and high priority variation of it) that runs X number of term searchs per at a time,  then when done with those terms it moves on to the next X terms, until its done. Default is 10 terms at a time (10 is the default in if number of terms per job is not specified)


Leave a Reply

Your email address will not be published. Required fields are marked *