About
This is the development journal for the Semantic Web API project. It is a standard interface for accessing RDF data through multiple clients. Sponsored by Semedia and Google.
Membership
Name:James Cerra
Location:Pittsburgh, Pennsylvania, United States
Recent Entries
Archived Entries

Thursday, October 13, 2005

CVS Heck Rant

Just a reminder for everyone who is listening: Don't ever depend on CVS code! Only develop with released classes; otherwise, you are asking for trouble. At best you have to keep updating your sandbox or incompatibilities will creap in. On the other had, at worst you end up developing two applications or more instead of just the original one. And for Pete's sake, don't release an application that depends on a custom build from CVS. That becomes an unintentional fork, and they really stall developement for users of your application.

Friday, September 23, 2005

How does Swapi 0.0.70 look?

I released Swapi 0.0.70, since it has about 70% of my requirements and features implemented. Give it a look, please, and tell me what's up! If you use it with Jena, you need Jena 2.2, but if you use it with OpenRDF Sesame, you should use Sesame 1.2.1. I still would like a Redland implementation, but that will probably have to wait until I figure out how to install and use Redland. (Unless anyone contributes code and unit tests... hint! hint!)

To use Swapi 0.0.70, you'll have to build it. I'll build a jar file if anyone asks (so speak up) for you. It requires Java 1.5 5 to build and run. Generics just make life so much easier! :-) You don't have to use both Jena and Sesame when using it; however, you should have both installed when building Swapi. A future release (before 0.1 I promise) will do away with this requirement.

The unit tests are an excellent example of how to use Swapi. I highly recommend looking at them. In a nutshell, they use an abstract method that subclasses should use to construct the connection to your implemenation of choice. Subclasses of those abstract classes (with the abstract connect method) implement it for a particular RDF backend. This involves configuring the repositories/modelmakers, loading the data into them, and creating the appropriate implementation of Connection from either Jena or Sesame or your own driver. A later release (before 0.1) will provide the DataSource API for making this easier.

The next steps involve defining the query API and implementations. The general design is very loosely based on an early version of the SPARQL Protocol. Also, I wanted to build off of Ryan Levering's SPARQL Engine, but the latest version (0.5) ties into the CVS beta of OpenRDF. That is incompatible with OpenRDF 1.2.1, so unless I find a simple solution I may have to fork the engine. :-/

Any suggestions for Swapi are greatly appreciated!

Saturday, September 17, 2005

Sick as a dog

Believe it or not, I actually scheduled a giant CVS, documentation, and blog update today. However, I have the flu and feel horrible. Updates later this week. :-(

In the mean time, check out these updates:

Hope your weekend is better than mine! :-)

Wednesday, July 13, 2005

Dialogue and Apache

Small update. I added a dialog between Giovanni Tummarello (mentor) and Jeen Boekstra (of Openrdf.org) that took place a few weeks ago. Lots of insight between them.

Apache Madness

I also configured Apache to serve a description of the project at the same URI as the HTML home page. This is because both the web site and the DOAP document describe the project. That is, each are different representations of the same thing. One is human-readable while the other is computer literate. So they both refer to the same resource - the project for SWAPI.

Type Maps

I had a hard time coaxing Apache into doing content negotiation. My first attempt involved type maps (with extension var). To get it working.... I had to call the Apache type map file index.html.var since Apache by default looks for anything named index.html - yuck! The contents of the file was:

URI: home.html
Content-type: text/html

URI: doap.html
Content-type: application/rdf+xml

What that does is set two documents to be loaded depending on the content type specified by the user agent requesting the resource.

MultiViews

Christopher Schmidt pointed out a better way to configure Apache to do this. You scrap the var file becuase it is now redundant. Instead, you configure the .htaccess file (that's the entire file name) to have the content:

Options +Multiviews
AddHandler type-map var
DirectoryIndex index
AddType application/xhtml+xml;qs=0.9 .xhtml+xml .xhtml
AddType text/html;qs=.9 .html
AddType application/rdf+xml;qs=0.8 .rdf+xml .rdf
AddType application/xml;qs=0.3 .xml

The first line tells Apache to use MultiViews. This lets it automatically fake a type map based on the file extensions of a file. The second line is unimportant (it just lets type maps work, but they aren't necessary anymore).

The third line instructs Apache to look for evey file with name minus extensions named index. Think of it like index.* (but it also matches index.en.html and index.html.fr for example). Then you put multiple files in a directory and modify their extensions. A user agent will tell Apache which content-types it wants; Apache will automatically select the appropriate file and serve that as the index. I used the files:

  • index.html
  • index.xhtml+xml
  • index.rdf+xml

So how does Apache determine which file to send? That's what the last lines do. They instruct Apache to server anything a certain extension (like html) as a certain content-type (like text/html). The ;qs= parts are strings that modfiy the "quality" of representations in that media type. They range from 0 to 1, where 1 means the perferred representation to serve, while 0 means the worst type to serve. For example, an image/png document may be the perferred form of the Mona Lisa, while a text/plain document (with ASCII art) is the worst besides a 404 of course. So those last lines say how to serve each type, and that:

  1. application/xhtml+xml and text/html are the best forms.
  2. application/rdf+xml is the next best type
  3. application/xml is not a good type, but better than some

Web browsers also rank they're perferred content-types too. As I understand it. They will send a list of types they accept, and how important each is to the browser. Apache will filter out the ones it doesn't support, then the web browser's lower ranked ones (below the highest available ones), and filter out all but the best from .htaccess, then a random one from the final list (you should only have one here for consistancy).

This way, you can serve application/xhtml+xml to Mozilla and Opera, text/html to Internet Explorer (which doesn't support the XHTML media type), and applcation/rdf+xml to semantic web programs searching for RDF data. I went with this solution, and it seems to work.

Monday, July 11, 2005

Well, it's done.

No, I don't mean that the project is finished. If only! ;-) The web site for SWAPI is finally in a useful state. There are always things to improve, but the core of the site seems to be ready. Now the hard part begins... actually creating the framework and gathering a community. Can't wait.

Tools to easily create SWAPI

There were several tool requirements for SWAPI. Google stipulated that all projects must be developed in the open. My mentoring organization doesn't have an in-house environment, so I decided to use something simular to my existing project on Java.net. SWAPI is also simular in some ways with SAX, and I want to follow the development pattern they set forth.

I also observed how a wiki for the Atom project helped to document the decisions that led to its design. I like wikis; the way they're organized matches the way I think. (if you can't tell. :-)) MoinMoin seemed to be the easiest to install, confgure, and relatively pretty too. I used the default stylesheet as the base for the SWAPI web site's look and feel.

Finally, all the cool people are blogging about their work, so I should too. :-p Actually, I think there are some concrete advantages to a developer journal. They allow programmers to easily report on their current status and thinking patterns. I also use them to document ideas that I will probably forget in six months (when they're actually needed of course). They are also a good way of receiving community feedback and ideas in different ways from wikis.

Choosing a host for the project

The project needed a fast host with version control, issue tracking, and remove login systems. The host Ourproject.org was chosen to host SWAPI after some debate. There were some configuration errors with the new account, but Ourproject's staff quickly corrected those mistakes. The site is fast and responsive, and the environment is a standard Debian box so it was easy to create a web site while also maintaining security. All in all, they provided the nicest experience.

Several other communities were also considered. Tigris didn't have ssh access, which made it hard to set up a web site. It was also a confusing web site; the documentation was very unhelpful. Eduforge provided access through ssh, but the environment was crippled to only support CVS administration. (You couldn't even make a directory from the shell!) That's too bad because a community started forming around the project there. For that reason, I'm considering keeping SWAPI's Eduforge project page active and synced with the tools at the Ourproject page.

Sourceforge provided a well administered environment; everything just worked. They had the most comprehensive documentation too. However, the Sourceforge platform itself is not open source. That is kinda weird, which prompted my investigation into other organizations using an open source client (namely Gforge). Also, their severs were quite slow and the the shell environment prevented external web access. While I understand the reasons, this was quite a problem since my connection to the internet is very slow - averaging 36kps!

My previous bad experience with SourceForge also made me shy about using them again. (It took over a month to get an earlier project approved, yet it was canceled after a few weeks of inactivity.) Finally Sourceforge is really really ugly! Not that I'm Leonardo Di Vinci of course. ;-) A shame since they certainly have the largest community around. ::shrug::

Beyond 2000

The next step on the agenda is to aggregate all of the various API standards flying around the RDF community. There have already been some work on documenting the various applications, so I don't think it would take long. I also plan on documenting the initial ideas for the project's design. There are several directions I investigated during the last six months. They need to be digitized and properly documented. Finally, I also have to promote the project, something I'm very poor at.

There are just enough back-end tools becoming mature that application development is becoming really exciting. The use cases for the semantic web are starting to be implemented. However, the existing tools are still few and young enough - entering their first or second version - that the direction the community is moving toward can be easily inferred. That's one reason I feel it's an ideal time to create a standard interface.

Later.

Saturday, July 09, 2005

Summer of Code

Thank You Very Much Semedia and Google.