Tonight I attended the SDForum Web Services SIG meeting whose
topic was “Semantic XHTML — Can your website be your API?”.
The presenters were Kevin Marks and
Tantek Çelik from Technorati.
Following are my rough notes from this interesting presentation.
Update 2004-10-05: Slides from this talk now posted on Tantek’s site
Semantic XHTML
Can your website be your API?
SDForum Web Services SIG, 2004-09-28
Some SDForum general topics:
* Monthly Web Services Working Group will probably be formed in a couple months
* Forming a new Web Client SIG, topics to inclue RSS, Atom, SOAP, REST, etc.;
looking for a host
* New PayPal Hacks book coming out
Background on Technorati
Tracking 4 million blogs now (was 3 million in June). About 4 million posts per
week. New Politics site tracks and summarizes about 10,000 political blogs.
Link analysis is the key attribute of their processing. For international,
they use UTF-8 internally and can convert from the majority of encodings as
needed. Not as much content searching yet for internationals, but not as
critical yet because they rely on links rather than content.
Presentation
HTML started structured, became presentational during browser wars. Explosive
growth because of error tolerance. Table abuse & font tagitis & spacer GIF
layouts caused two backlashes:
- Backlash for structure — XML; draconian error checking, freedom to make
own schemas, appeals to programmers - Backlash for layout — CSS; move presentation away from structure, content
independence, appeals to designers, http://www.csszengarden.com
Where does XML fail?
- schema explosion (everyone makes their own)
- tag/attribute battling
- abstraction ratholes – BTO ontology
- not human readable (partly by design)
- doesn’t work on “the Web” today
Where does CSS fail?
- folk coding (design rather than engineering community)
- variable implementations
- visual designers thinking about presentation ass structure
- structure hacks to fix presentation
Can we re-integrate these strands?
- XHTML is XML (XHTML = HTML made into XML)
- parseable, modular
- XHTML supports CSS
- everyone already has a viewer
- everyone can make queries
Example – Politics Site. Sample problem:
- wanted a chart of the top 3 links on a page
- dynamically generated using some complex app logic to choose
the link title based on transient data - solution: use the site output page as input, easily parsable
to extract desired information - this web page wasn’t originally designed with that in mind,
but due to its structure was reusable
XHTML building blocks
- most applications reuse a lot of common concepts
- strings
- lists, correspond to program arrays (
<ol>
and<ul>
) - tables, can be used for 2D array
- links with ‘rel’ attribute explicitly defines relationship;
is extensible and multivalued - definition lists, key/value pairs or hashtables
- citations and quotes; cite a person or source by name,
popular use in weblogs
Existing examples
- XFN – XHTML friends network; just add ‘rel’ to your
blogroll links; define profile using a dictionary:
http://gmpg.org/xfn/1
Future example
- attention.xml; what are you reading, how often are you
reading them, etc. with goal of application that can
help synchronize what you’re reading, help highlight
things that you are interested in - XSPF – play lists (XML shared playlist format)
New types – Methodology
- map existing data structures into XHTML equivalents
- enable new stylable building blocks
- readily exchange data as mapping is 1:1
New type – People
- RFC 2426 vCard <-> hCard
- create an XHTML representation of this
- embed within a webpage, share to and from the web
New type – Events
- RFC 2445 iCalendar <-> hCalendar
- describe events
- display them and enable parsing