Of course I could try to change the internal iWeb templates but that would be painful and I'd probably have to re-do that every time iWeb gets an update. It would be nicer to work on the published pages. If you go to your iDisk then you will see a "Sites" and a "Web" folder. The Sites folder was (is) used by the old web-based ".Mac HomePage" or can be used to publish self made pages. And if you look into the "Web" folder, then you see the code that iWeb generates. You can view and even modify the code there, and it will retain you modifications until you re-publish the site in iWeb.
So the problem is easy, open every html file, insert a code snippet before the
</body>
tag. Sounds like a job for a script. Fortunately AppleScript has this great support for filtering and you can do that recursively through folders , too. Should be as simple asget every file of entire contents of iWebBaseFolder where name ends with ".html"
It should be. Try this with any decently sized iWeb page and you will get a timeout error. Of course you can increase the timeout, but it seems wrong that AppleScript chokes on this. Note: I have mounted my iDisk the standard way, so it is using WebDAV und you can tell by the delays this causes in Finder, if you have set your iDisk to synchronize with a local mirror, then this might actually work.
Of course finding the files is only the first part, then you have to open the text parse it for the
</body>
tag and insert the code and save the file again. All of this is very painful in AppleScript.Wait, isn't this what Unix is supposed to be good at? Let's try. The find part is easy
find /Volumes/idiskname/Web/Sites -name '*.html'
You can still see the names appearing but it is much better than the AppleScript solution. So how do we go about the text manipulation? The answer is
sed
(stream editor) which takes a stream of characters and somehow manipulates this using the magic incantations of regular expressions and things that the sed
man pages calls "functions" but which are basically single letters which are meaningful to the initiated and completely illegible to laypersons. Thankfully you can enter and nice search in Google (there it is again, I have no idea how I was able to learn programming entirely without Google) and find some examples:# substitute "foo" with "bar" EXCEPT for lines which contain "baz"
sed '/baz/!s/foo/bar/g'
If the script already inserted the snippet then we won't need to insert it again. So some experimenting and much confusion lead to:
sed -i .bak -e "/$textToInsert/!s/$textToReplace/$textToInsert&/g" filename
Where the
$
prefix denotes variables I defined earlier in the script to turn it into something close to legible. What this command does is: if a line does not (!
) contain $textToInsert
then substitute (s
) $textToReplace
with $textToInsert
and append the text we orginally searched for ($textToReplace
, &
) globally across the document (g
) filename
and then write that into the file, keeping a copy with a .bak
extension around in case all this gibberish happens out to pruduce... well gibberish.Combine that with the find command form earlier and a nice -exec extension and you get the entire script:
#!/bin/bash
googleAnalyticsCode='enter your Google Analytics code number here'
textToInsert="<script src=\"http:\/\/www.google-analytics.com\/urchin.js\" type=\"text\/javascript\"><\/script><script type=\"text\/javascript\">_uacct = \"$googleAnalyticsCode\";urchinTracker();<\/script>"
textToReplace="<\/[Bb][Oo][Dd][Yy]>"
iWebBasePath='/Volumes/idiskname/Web/Sites'
# this is where the actual work happens
find $iWebBasePath -iname '*.html' -exec sed -i .bak -e "/$textToInsert/!s/$textToReplace/$textToInsert&/g" {} \; -print
Basically a one liner. I added the
-print
at the end of the command so I can see which files the script is working on. Otherwise you would get no feedback at all.Admittedly very powerful. Armed with this new knowledge we can go ahead a write a script that removes the Google Analytic snippet again:
#!/bin/bash
googleAnalyticsCode='enter your Google Analytics code number here'
textToRemove="<script src=\"http:\/\/www.google-analytics.com\/urchin.js\" type=\"text\/javascript\"><\/script><script type=\"text\/javascript\">_uacct = \"$googleAnalyticsCode\";urchinTracker();<\/script>"
iWebBasePath='/Volumes/idiskname/Web/Sites'
# this is where the actual work happens
find $iWebBasePath -iname '*.html' -exec sed -i .bak -e "s/$textToRemove//g" {} \; -print
and (I bet you waited fro this) a one liner to remove all those pesky
.bak
files (after testing of course):find /Volumes/arminb/Web/Sites -iname '*.bak' -exec rm {} \; -print
(again the print is for the sole purpose of having something to watch) And I know some smart guy will chime in here and say that
xargs
would be so much more efficient than -exec
and that is true but I will leave that for another day.I love AppleScript very much but in this case the command line tools are way more efficient (though painful to learn). I guess the resume here should be: "Know your tools!"
6 comments:
hi,
like your scripts! Some typing errors;
where you mention
-iname it should read -name
*.html should read '.html' (same counts for .bak)
thanks again
*.html and *.bak should be quoted. Thanks for catching that, I updated the article.
I used -iname for a purpose though, it does a case-insensitive compare as opposed to -name. As Mac OS X uses a case-insensitive filesystem (by default), this really makes sense.
Very good! Thank you.
http://web.mac.com/kdwedge
Hi Armin
Very useful tip now that I'm playing with iWeb. What I find interesting is the fact that it might be used to include other type of content without editing the html files. For that matter I've wrapped around your script with automator to ask for the analytics account and the folder where the pages reside. The idea is to later enhace it to include you tube videos for example.
The automator file can be found here:
document.wflow
I would love to do all this, as I want to learn to optimize my iweb blog and site... but I am having a problem following all of it. I am not versed in html or much nerd speak... where do I start so I can grasp what you are describing?
Or is anyone building a scripting app to do what you are suggesting. With all the iWeb users out here it sure would be a good one!
get every file of entire contents of iWebBaseFolder where name ends with ".html"
There's a workaround for the timeouts. You just set your text item delimiters to something funky, then do:
get every file of entire contents of iWebBaseFolder where name ends with ".html" as alias as string
...then split on the funky delimiter. Or something like that. I haven't done it in a few years and am sitting at a PC presently. Play around with it and you'll figure it out though. Someone should put up an AppleScript wiki to make these things less painful to discover.
Post a Comment