Archive for February, 2009

Contrassegni di apertura del sabato il conto alla rovescia al movimento del mercato verso una nuova posizione nel parco di lungomare del Tom McCall fra il fiume di Willamette e la strada panoramica di Naito di sud-ovest in aprile.

Recently I've really enjoyed reading blog posts which just explain a little bit of code, so that's what this is. I had this code lying around from a few months ago so I added some context and links. It combines two of my favourite things: Annodex and Haskell!

YouTube's video offset syntax

Some time last year, YouTube introduced a feature which allows you to specify a hyperlink that plays a video from a given time offset. If you used the syntax on a random video site, it would look like this:

http://www.example.com/player.html#t=3m54s

That syntax for this is very close to that which we use in Annodex for Temporal URIs, now running on Archive.org (and soon on Wikipedia):

http://www.example.com/video.ogv?t=3:54

Two differences:

1. YouTube uses a fragment instead of a query parameter.

A fragment is something starting with '#' that tells the client to jump to a particular offset in the document -- in general the fragment text is never seen by the server. In the case of YouTube the HTML page contains JavaScript that tells the embedded Flash video player to seek to the offset in the video.

Fragments are useful in this use case, where you are instructing the embedding web page to play the video from a given time offset. How it actually retrieves the video from the network is not specified, but importantly there is no requirement for the embedding web page to be reloaded.

(This distinction between fragments and queries is part of the W3 Media Fragments WG discussion on syntax).

2. The syntax uses unit markers h, m, s to separate the parts of time, whereas our specification uses the kind of specifiers common in industrial equipment (and clock radios).

Perhaps one advantage of the format YouTube have chosen is readability: sometimes it is difficult to read times such as 03:36:14.

http://www.example.com/video.ogv?t=3:54
http://www.example.com/video.ogv?t=00:03:54.000
http://www.example.com/video.ogv?t=npt:00:03:54.000
http://www.example.com/video.ogv?t=smpte-25:00:03:54::0

We had a recent discussion about these issues in the Media Fragments WG: Action-28: updated syntax document with time formats. I'm pretty happy with the syntax we have settled on, allowing for both readable short timestamps and more accurate long ones.

Pretty printing of durations

Anyway, I was bored so I hacked up a sweet fold to display the format used by YouTube.

Haskell hackers use folds like C programmers use for loops; the Haskell wiki page Fold is a beautiful introduction to the topic. My favourite Web 1.0 interactive visualization of a left fold is at foldl.com (and also be sure to check out its companion site for right folds, foldr.com).

Here's a concise fold that gets us most of the way to the right syntax:

> ts = [("ms", 1000), ("s", 60), ("m", 60), ("d", 24), ("y", 365)]
>
> duz ms = ss
> where (ss, _) = foldl (\(ss, x) (s, y) -> (show (rem x y) ++ s ++ ss, quot x y)) ("", ms) ts

Yeah, concise. Read it slow! if it was in C or Python, that one-liner would be a 10 or 5 line loop.

You might say that you use the fold function to iterate through a list of time units, and at each step of the iteration you do an integer division by the unit, label the remainder, and pass the quotient on to the next step of the iteration. A real Haskell programmer, however, might say something like "you fold the duration quotiently through the units, labelling into the syntax!", with much wringing of hands and wishful glances for abstract ponies. Fold is a verb, because functions are alive! Quotiently is not a word.

A problem with duz (apart from the crappy name) is that it shows times like 0y0h3m54s0ms. The next implementation of duration strips the leading and trailing zeroes:

> dur ms = years:rest
> where (rest, years) = foldl (\(ds, x) y -> ((rem x y):ds, quot x y)) ([], ms) [1000, 60, 60, 24, 365]
>
> duration ms = concat $ map (\(n, s) -> show n ++ s) (takeWhile (not . zero) $ dropWhile zero labelled)
> where labelled = zip (dur ms) ["y", "d", "h", "m", "s", "ms"]
> zero (n, _) = (n==0)

eg. to display the duration of 2^32 milliseconds:

*Main> duration (2^32)
"49d17h2m47s296ms" *Main> duration 3600000
"1h"

Fold is a generic list processing device; if you want to limit the amount of the list that is processed, you can use functions like takeWhile and dropWhile. These will take, or drop, elements from the list as long as some criterion is satisfied; you can use them both together to trim both the start and end of the list. Of course you can use these on the input list to limit what data is processed; but because Haskell evaluates lazily, you can also use these on the output list to limit how much of the processing is actually done (like in duration above). The bits of the evaluation that don't really need to get done, aren't: the idea of doing them is written down (on a "thunk") and thrown away. Burn your todo lists! Be lazy lazy lazy! Haskell rules. Do you like verbs?

Shuttleworth Says Linux is a Joke

| February 28th, 2009
Linux is a joke. Well, that may be a bit harsh, but Ubuntu certainly seems to be all the excuse founder Mark Shuttleworth needs to make one bad pun after another. After Bill Gates' performances with Jerry Seinfeld, one wonders if becoming a billionaire tech mogul alters brain chemistry. At any rate, Paul Rubens reports on the future of Karmic Koala and Canonical. (Hint: not as successes in show biz.)
I'm taking PCWorld's new LinuxLink Blog under my wing. Being a mentor comes with great responsibility.
MakeTechEasier: "Quite possibly the most distinguishing feature of Debian-based Linux distributions (such as Ubuntu, Mepis, Knoppix, etc) is their package system - APT. Also known as the Advanced Package Tool, APT was first introduced in Debian 2.1 in 1999. APT is not so much a specific program as it is a collection of separate, related packages."
Have a confusing and entertaining Saturday :)

Ingredients:

  • some lame DNS server
  • logcheck
  • spamassassin

The last couple of days I've been plagued by some DNS errors that kept showing up in the logcheck mails for my home server which I was busy migrating from one box to another, doing an upgrade from etch/i386 to lenny/amd64 at the same time. So, plenty of stuff going on to confuse the issue.

I kept getting the following messages every hour (anonymized):

named: connection refused resolving 'somedomain.org/NS/IN': xxx.yyy.zzz.nnn#53
named: connection refused resolving 'somedomain.org/NS/IN': xxx.yyy.zzz.mmm#53
named: connection refused resolving 'ns1.somedomain.org/AAAA/IN': xxx.yyy.zzz.mmm#53
named: connection refused resolving 'ns2.somedomain.org/AAAA/IN': xxx.yyy.zzz.mmm#53
named: connection refused resolving 'ns1.somedomain.org/AAAA/IN': xxx.yyy.zzz.nnn#53
named: connection refused resolving 'ns2.somedomain.org/AAAA/IN': xxx.yyy.zzz.nnn#53

The times were fairly regular: once just before the hour, most 2 minutes after. I fetch mail at around that time, but also at other times, so possible but unlikely. The 2 minutes after was the first real clue: some cron job maybe? After disabling logcheck the messages no longer appeared in the log. Enable it again, and they were back.

Additional confusion was caused by the fact that the domain had "debian" in its name, but it was somewhere obscure. So why was logcheck causing a lookup for that domain? This did confuse me enough to waste some time looking for some silly weird (default) configuration problem in some package.

Enter spamassassin. Apparently that was parsing the message body, recognized "somedomain.org" as a host name, and proceded to do a DNS lookup as validity check.

So we have the following loop, started off by something causing an initial DNS lookup for the domain, which fails and gets logged:

  • logcheck reports the failure during its next check
  • spamassassin processes the logcheck mail, spots the domain name and does a new set of lookups, which fail and get logged
  • logcheck reports the failures during its next check
  • ...

Duh.

I remember struggling with probably the same problem a couple of years ago, but then it was a lot more severe: masses of repeating DNS errors for obscure domains. At that time I failed to get to the bottom of it and ended up just ignoring the errors by adding the following option in my bind9 configuration:

logging { category lame-servers { null; };
};

Anyway, now I just no longer pass logcheck mails through spamassassin. (Although filtering out these DNS errors in bind9 can be perfectly valid.)

Luis de Bethencourt: a week in nyc

| February 28th, 2009
tomorrow morning i'm flying to new york city, and will spend a week there with my buddy alberto ruiz (from gnome fame and canary islanders messers club). any recommendations of things to see and people to do? (oops, i think i got that one backwards :P)

our plan is: no plan. impromptu style.

if you are around the manhattan area... don't hesitate and please, leave a comment or send me an email to luisbg [@] ubuntu [dot] com, we will be glad to be guided by some local geeks.

Fedora 10, an amazing Fedora release in its own right, had 28 approved features. Fedora 9 had 30 and Fedora 8 had 21.

As of writing this Fedora 11 has 51 which have already been approved, plus another 9 waiting to be approved any day now. That means in the end there should be ~60 approved features which make it into Fedora 11! This doesn’t even count the work going into external things such as overhauling the documentation or the community work going into the Moksha project.

The features on the list aren’t trivial either, take a look. Almost every aspect of the OS is having some substantial work going into it. You name it, boot time, instant messaging, delta RPMs, 64 bit kernel in 32 bit system, the media player, networking, security, KMS for nearly all open source drivers, compilers, etc.. will be seeing a lot of new love. (Much more than just upgrades) This list will have other distros playing catchup for some time to come.

So, show your appreciation by testing the new release. It may be best to start with the upcoming beta to be released on March 24, so mark your calendars. Even better, in addition to testing sign up and become a part of the Fedora Project. You won’t regret it, the community and innovation you will come across will never be matched anywhere. Guaranteed. )

Kelly O’Hair: Ant and Importing

| February 28th, 2009

Just how many copies of junit.jar have been added to source repositories on the planet? Quite a few I imagine, seems like a waste of repository data space and well, just wrong. Not junit, which is a fantastic product, just the fact that we have so many copies. Granted you have gained a pretty stable tool by freezing the version you have, and you have guaranteed having a copy at all times, but is it a good idea to add all these binary fines to your repository data? As the list of tools like this grows and grows, does the "just add it to the repository" solution continue to scale? And each time you need a new version, you end up adding even more binary data to your repository.

Some projects have taken to doing a kind of "tools bootstrap" by downloading all the open source tools the first time you setup a repository, making the files immune from normal 'ant clean' actions. Ant has a a task called <get> which can allow you to download tool bundles and it works quite well, but there are some catches to doing it this way. Expecting all the download sites to be up and available 24/7 is not realistic. And predictability is really important so you want to make sure you always download the same version of the tools, keeping a record of what versions of the tools you use.

So what we did in the openjfx-compiler setup repository, was to create an import/ directory to hold the downloaded tools, automate the population of that area with the <get> task, and also allowed for quick population of import/ with a large import zip bundle. The initial version of the repository had a very similar mechanism, so this idea should be credited to the original authors on the OpenJFX Compiler Project.

This logic is contained in the file build-importer.xml of the setup repository and for each tool NAME downloaded, a set of properties is defined (import.NAME.*), and 2 ant tasks import-get-NAME and import-NAME. Probably best to look at the bottom of this file first. As before quite a few macrodefs were used to make this all work.

Your browser does not support iframes.

The ant build script then just uses ${import.junit.jar}.tt> to get a junit.jar file.

You can actually try this out yourself pretty easily if you have Mercurial (hg) and ant by doing this:

hg clone https://kenai.com/hg/openjfx-compiler~marina-setup setup
cd setup
ant import

Of course I'll predict that it fails the first time for 50% or more people, this kind of downloading is just not that reliable when depending on all these sites. So you may have to run ant import a few times.

-kto