11 October 2006

Shell vs. AppleScript: 1-0

I added Google Analytics to his blog, just because I'm really curious and it really beats a boring hit counter (left handed and twice on any week-day). Aside from the fact that I am totally enslaving myself and my online publications to Google (and happily doing that, which is weird) it was really easy, you just have to insert a few lines of javascript into every page. Of course using a template based blogging system makes that really simple. But then I wondered about wether I could get Google to analyse my iWeb Blog?

Of course I could try to change the internal iWeb templates but that would be painful and I'd probably have to re-do that every time iWeb gets an update. It would be nicer to work on the published pages. If you go to your iDisk then you will see a "Sites" and a "Web" folder. The Sites folder was (is) used by the old web-based ".Mac HomePage" or can be used to publish self made pages. And if you look into the "Web" folder, then you see the code that iWeb generates. You can view and even modify the code there, and it will retain you modifications until you re-publish the site in iWeb.

So the problem is easy, open every html file, insert a code snippet before the </body> tag. Sounds like a job for a script. Fortunately AppleScript has this great support for filtering and you can do that recursively through folders , too. Should be as simple as

get every file of entire contents of iWebBaseFolder where name ends with ".html"

It should be. Try this with any decently sized iWeb page and you will get a timeout error. Of course you can increase the timeout, but it seems wrong that AppleScript chokes on this. Note: I have mounted my iDisk the standard way, so it is using WebDAV und you can tell by the delays this causes in Finder, if you have set your iDisk to synchronize with a local mirror, then this might actually work.

Of course finding the files is only the first part, then you have to open the text parse it for the </body> tag and insert the code and save the file again. All of this is very painful in AppleScript.

Wait, isn't this what Unix is supposed to be good at? Let's try. The find part is easy

find /Volumes/idiskname/Web/Sites -name '*.html'

You can still see the names appearing but it is much better than the AppleScript solution. So how do we go about the text manipulation? The answer is sed (stream editor) which takes a stream of characters and somehow manipulates this using the magic incantations of regular expressions and things that the sed man pages calls "functions" but which are basically single letters which are meaningful to the initiated and completely illegible to laypersons. Thankfully you can enter and nice search in Google (there it is again, I have no idea how I was able to learn programming entirely without Google) and find some examples:

# substitute "foo" with "bar" EXCEPT for lines which contain "baz"
sed '/baz/!s/foo/bar/g'

If the script already inserted the snippet then we won't need to insert it again. So some experimenting and much confusion lead to:

sed -i .bak -e "/$textToInsert/!s/$textToReplace/$textToInsert&/g" filename

Where the $ prefix denotes variables I defined earlier in the script to turn it into something close to legible. What this command does is: if a line does not (!) contain $textToInsert then substitute (s) $textToReplace with $textToInsert and append the text we orginally searched for ($textToReplace, &) globally across the document (g) filename and then write that into the file, keeping a copy with a .bak extension around in case all this gibberish happens out to pruduce... well gibberish.

Combine that with the find command form earlier and a nice -exec extension and you get the entire script:

#!/bin/bash

googleAnalyticsCode='enter your Google Analytics code number here'
textToInsert="<script src=\"http:\/\/www.google-analytics.com\/urchin.js\" type=\"text\/javascript\"><\/script><script type=\"text\/javascript\">_uacct = \"$googleAnalyticsCode\";urchinTracker();<\/script>"
textToReplace="<\/[Bb][Oo][Dd][Yy]>"
iWebBasePath='/Volumes/idiskname/Web/Sites'

# this is where the actual work happens
find $iWebBasePath -iname '*.html' -exec sed -i .bak -e "/$textToInsert/!s/$textToReplace/$textToInsert&/g" {} \; -print


Basically a one liner. I added the -print at the end of the command so I can see which files the script is working on. Otherwise you would get no feedback at all.

Admittedly very powerful. Armed with this new knowledge we can go ahead a write a script that removes the Google Analytic snippet again:

#!/bin/bash

googleAnalyticsCode='enter your Google Analytics code number here'
textToRemove="<script src=\"http:\/\/www.google-analytics.com\/urchin.js\" type=\"text\/javascript\"><\/script><script type=\"text\/javascript\">_uacct = \"$googleAnalyticsCode\";urchinTracker();<\/script>"
iWebBasePath='/Volumes/idiskname/Web/Sites'

# this is where the actual work happens
find $iWebBasePath -iname '*.html' -exec sed -i .bak -e "s/$textToRemove//g" {} \; -print

and (I bet you waited fro this) a one liner to remove all those pesky .bak files (after testing of course):

find /Volumes/arminb/Web/Sites -iname '*.bak' -exec rm {} \; -print

(again the print is for the sole purpose of having something to watch) And I know some smart guy will chime in here and say that xargs would be so much more efficient than -exec and that is true but I will leave that for another day.

I love AppleScript very much but in this case the command line tools are way more efficient (though painful to learn). I guess the resume here should be: "Know your tools!"

6 comments:

Anonymous said...

hi,
like your scripts! Some typing errors;
where you mention

-iname it should read -name

*.html should read '.html' (same counts for .bak)
thanks again

Armin said...

*.html and *.bak should be quoted. Thanks for catching that, I updated the article.

I used -iname for a purpose though, it does a case-insensitive compare as opposed to -name. As Mac OS X uses a case-insensitive filesystem (by default), this really makes sense.

Anonymous said...

Very good! Thank you.

http://web.mac.com/kdwedge

me said...

Hi Armin

Very useful tip now that I'm playing with iWeb. What I find interesting is the fact that it might be used to include other type of content without editing the html files. For that matter I've wrapped around your script with automator to ask for the analytics account and the folder where the pages reside. The idea is to later enhace it to include you tube videos for example.
The automator file can be found here:
document.wflow

Radmacdaddy said...

I would love to do all this, as I want to learn to optimize my iweb blog and site... but I am having a problem following all of it. I am not versed in html or much nerd speak... where do I start so I can grasp what you are describing?

Or is anyone building a scripting app to do what you are suggesting. With all the iWeb users out here it sure would be a good one!

Michael Terry said...

get every file of entire contents of iWebBaseFolder where name ends with ".html"

There's a workaround for the timeouts. You just set your text item delimiters to something funky, then do:

get every file of entire contents of iWebBaseFolder where name ends with ".html" as alias as string

...then split on the funky delimiter. Or something like that. I haven't done it in a few years and am sitting at a PC presently. Play around with it and you'll figure it out though. Someone should put up an AppleScript wiki to make these things less painful to discover.