I added
Google Analytics to his blog, just because I'm really curious and it really beats a boring hit counter (left handed and twice on any week-day). Aside from the fact that I am totally enslaving myself and my online publications to Google (and happily doing that, which is weird) it was really easy, you just have to insert a few lines of javascript into every page. Of course using a template based blogging system makes that really simple. But then I wondered about wether I could get Google to analyse
my iWeb Blog?
Of course I could try to change the internal iWeb templates but that would be painful and I'd probably have to re-do that every time iWeb gets an update. It would be nicer to work on the published pages. If you go to your iDisk then you will see a "Sites" and a "Web" folder. The Sites folder was (is) used by the old web-based ".Mac HomePage" or can be used to publish self made pages. And if you look into the "Web" folder, then you see the code that iWeb generates. You can view and even modify the code there, and it will retain you modifications until you re-publish the site in iWeb.
So the problem is easy, open every html file, insert a code snippet before the
</body>
tag. Sounds like a job for a script. Fortunately AppleScript has this great support for filtering and you can do that recursively through folders , too. Should be as simple as
get every file of entire contents of iWebBaseFolder where name ends with ".html"
It should be. Try this with any decently sized iWeb page and you will get a timeout error. Of course you can increase the timeout, but it seems wrong that AppleScript chokes on this. Note: I have mounted my iDisk the standard way, so it is using WebDAV und you can tell by the delays this causes in Finder, if you have set your iDisk to synchronize with a local mirror, then this might actually work.
Of course finding the files is only the first part, then you have to open the text parse it for the
</body>
tag and insert the code and save the file again. All of this is very painful in AppleScript.
Wait, isn't this what Unix is supposed to be good at? Let's try. The find part is easy
find /Volumes/idiskname/Web/Sites -name '*.html'
You can still see the names appearing but it is much better than the AppleScript solution. So how do we go about the text manipulation? The answer is
sed
(stream editor) which takes a stream of characters and somehow manipulates this using the magic incantations of regular expressions and things that the
sed
man pages calls "functions" but which are basically single letters which are meaningful to the initiated and completely illegible to laypersons. Thankfully you can enter and nice search in Google (there it is again, I have no idea how I was able to learn programming entirely without Google) and find
some examples:
# substitute "foo" with "bar" EXCEPT for lines which contain "baz"
sed '/baz/!s/foo/bar/g'
If the script already inserted the snippet then we won't need to insert it again. So some experimenting and much confusion lead to:
sed -i .bak -e "/$textToInsert/!s/$textToReplace/$textToInsert&/g" filename
Where the
$
prefix denotes variables I defined earlier in the script to turn it into something close to legible. What this command does is: if a line does
not (
!
) contain
$textToInsert
then substitute (
s
)
$textToReplace
with
$textToInsert
and append the text we orginally searched for (
$textToReplace
,
&
) globally across the document (
g
)
filename
and then write that into the file, keeping a copy with a
.bak
extension around in case all this gibberish happens out to pruduce... well gibberish.
Combine that with the find command form earlier and a nice -exec extension and you get the entire script:
#!/bin/bash
googleAnalyticsCode='enter your Google Analytics code number here'
textToInsert="<script src=\"http:\/\/www.google-analytics.com\/urchin.js\" type=\"text\/javascript\"><\/script><script type=\"text\/javascript\">_uacct = \"$googleAnalyticsCode\";urchinTracker();<\/script>"
textToReplace="<\/[Bb][Oo][Dd][Yy]>"
iWebBasePath='/Volumes/idiskname/Web/Sites'
# this is where the actual work happens
find $iWebBasePath -iname '*.html' -exec sed -i .bak -e "/$textToInsert/!s/$textToReplace/$textToInsert&/g" {} \; -print
Basically a one liner. I added the
-print
at the end of the command so I can see which files the script is working on. Otherwise you would get no feedback at all.
Admittedly very powerful. Armed with this new knowledge we can go ahead a write a script that removes the Google Analytic snippet again:
#!/bin/bash
googleAnalyticsCode='enter your Google Analytics code number here'
textToRemove="<script src=\"http:\/\/www.google-analytics.com\/urchin.js\" type=\"text\/javascript\"><\/script><script type=\"text\/javascript\">_uacct = \"$googleAnalyticsCode\";urchinTracker();<\/script>"
iWebBasePath='/Volumes/idiskname/Web/Sites'
# this is where the actual work happens
find $iWebBasePath -iname '*.html' -exec sed -i .bak -e "s/$textToRemove//g" {} \; -print
and (I bet you waited fro this) a one liner to remove all those pesky
.bak
files (after testing of course):
find /Volumes/arminb/Web/Sites -iname '*.bak' -exec rm {} \; -print
(again the print is for the sole purpose of having something to watch) And I know some smart guy will chime in here and say that
xargs
would be so much more efficient than
-exec
and that is true but I will leave that for another day.
I love AppleScript very much but in this case the command line tools are way more efficient (though painful to learn). I guess the resume here should be:
"Know your tools!"