PhantomJS in a Bash Loop
The problem I recently tackled was quite simple: I have a list of URLs (+ some additional information) in a tab separated file (.tsv) and want to run a PhantomJS script for each line with some parameters, which are provided in the file mentioned.
So initially it seems quite obvious to just open the file and use while read -r -a line
to do that. That reads the next line in the file, splits it on tabs and I can use it as an array in the body of the loop – works like a charm! However, as soon as I added my PhantomJS script, the loop stops after the first iteration. After hours of testing different ways to read the file and figure out what happens, it seemed clear, that there was just no line to read after Phantom was done.
Just to show how messy it ended up:
|
|
The idea was: instead of reading a file line by line, take a for loop and use sed
to read one specific line at a time. The following example worked as expected:
|
|
I guess the key components are quite clear:
First you need the range of the for
loop (starting at 2, to skip the header). Secondly get the current line with sed "NUMq;d" FILE
, whereas NUM
is the line you want to fetch (start counting at one, not zero) (details).
To get the line parsed into an array, you can still use read -r -a
(details) as before in a slightly different way.
One tip along the way: In case you ever want to force-stop the script (CTRL+C), you will notice, it will continue anyway (because the signal is trapped by PhantomJS).
A simple fix is to set up the trap in your shell script with trap "exit" INT
(details).
Maybe one more: What if there was an error processing the page? You probably want to keep track of those cases to take a look later on. Simply use phantom.exit(1)
(or any other non-zero exit code) in your phantom script and, within your loop, right after the phantomjs
call, add the magic line:
if [[ $? != 0 ]] ; then echo ${LINE} >> ${FAILED_FILE} ; fi
.
I hope that helps someone somewhere out there.
Thanks for reading!