Thursday, 10 December 2009

Generate RSS in JSP? Set the locale!

The RSS pubDate fields contain dates in a format that may be human readable, but not very handy in a programmer's point of view. According to the RSS specification the date must conform to the RFC822 specification. A typical RSS snippet looks like:

<item>
  <title>Generate RSS in JSP? Set the locale!</title>
  <description><p>Description of the feed item</description>
  <link>http://blog.jasha.eu/2009/12/generate-rss-in-jsp-set-locale.html</link>
  <guid>http://blog.jasha.eu/2009/12/generate-rss-in-jsp-set-locale.html</guid>
  <pubDate>Wed, 10 Dec 2009 09:55:00 +0100</pubDate>
</item>
An easy way to generate this from JSP is using the fmt library from JSTL.
 <%@ taglib prefix="fmt" uri="http://java.sun.com/jsp/jstl/fmt" %>

 <item>
  <title>${item.title}</title>
  <description>
    <![CDATA[<hst:html hippohtml="${item.html}"/>]]>
  </description>
  <link>${link}</link>
  <guid>${link}</guid>
  <pubDate>
    <fmt:formatDate value="${item.date.time}" 
      pattern="EE, dd MMM yyyy HH:mm:ss Z"/>
  </pubDate>
</item>

This seems to look good when you request the RSS feed from your browser. However when you request the same feed from a Java (ROME), a request generator (Fiddler) or just use wget the format of the pubDate field is invalid. It returns "Wed Dec 10 09:55:00 CET 2009" instead of "Wed, 10 Dec 2009 09:55:00 +0100" as value of pubDate. What happened?

Your browser sends a header "Accept-Language" with, hopefully, "en-US" as first value. The JSTL library uses the en-US locale to format the date and the feed gets parsed correctly. The other tools don't send this header and the programmer did not specify a locale for fmt. fmt does not know how to format the date and returns a "Date.getString();".

A workaround for you as requester of the feed is to manually add the "Accept-Language" header to the request with value "en-US". For the JSP developer: the date format for RSS should always be "en_US". Just add this line to your JSP:

<fmt:setLocale value="en_US"/>

Sunday, 25 October 2009

Generate the robots.txt from Hippo CMS

The robots.txt is a response from your website that is unimportant for your human visitors but very important for search engine crawlers. That's why we created a Hippo CMS / Hippo Site Toolkit (HST) plugin to manage the robots.txt in the CMS and return the proper output.

robotstxt_01_cms The plugin comes with an out of the box document type to manage the parts of the site that are disallowed to crawl for search robots. There's usually one configuration for all crawlers but if you want, you can add multiple configurations per crawler.

In the first screenshot all crawlers should skip "/donotindex/" and "/search/", only "Googlebot" should ignore /hide/for/googlebot and the non existing "EvilBot" is kindly requested not to index the site at all.

Generating the response is mostly configuring the HST response for the request for "robots.txt". The plugin comes with a demo project and documentation how to configure the plugin for your existing project.

robotstxt_11_site In the second screenshot you see the response by the HST for the "robots.txt" request. The HST returns a plain text response with all the fields we've configured in the CMS.

Friday, 25 September 2009

Fast internet

Just did a test on Speedtest for my fibreglass connection at home.

Speedtest result

I like :-)

Thursday, 9 July 2009

The Google Future

It's 2012. EU regulations require a choice between at least 2 operating systems when buying a new computer. Since the introduction of these regulations the Windows market share dropped to 15% on new computers, against 65% for Google Chrome OS (GCO).

During the installation of GCO on my new laptop, all I have to do is fill in my Google account details. During the rest of the installation I can just click the next button because all other details like my name, address and current location are retrieved from my Google account. After the installation I click through my hard disk and am surprised what's already there. All my Google Docs are available offline to be edited with the Google Office suite which includes a word processor, spreadsheet, presentation, database engine, an image editor and email. I find copies of my music and video collection I previously downloaded from the Google Media Store and all my pictures from Picasa. The default instant messenger is, not surprisingly, Google Messenger. Twitter was integrated into Google Talk after the takeover in 2010 and Google started its own IP telephony service in 2011, pushing Skype out of the market.

I double click on my favourite song and it opens in the Google Media Player. Suggestions for related songs and concerts appear on the right hand side of the screen. I synchronise my media files to my Gune, Googles answer to the iPod. Between every song I hear a two second advertisement, the drawback of the free downloads from the Google Media Store.

It's time to fill in the spreadsheet for last months business expenses. While I fill in the declaration, advertisements appear for places to lunch and car rental companies. A bit annoyed about the amount of ads I look at the pictures from my last holiday. Suddenly my pictures are accompanied with advertisements of travel agents. When I want to tell my friends about these annoying advertisements Google Messenger comes with even more "related" advertisements.

Is this where we want to go tomorrow?

Thursday, 4 June 2009

Hippo Site Toolkit Query interface and pagination

Today an interesting question came from a developer of one of our implementation partners. He wanted to list items from our JCR repository and use pagination. In this post I tried to make a summary out of the conversation.

The query was:

HstQuery query = getQueryManager().createQuery(requestContext, scope, filterBean);
Filter filter = query.createFilter();
filter.addContains(".", "my keywords");
query.setFilter(filter);
HstQuery queryResult = query.execute();

The developer tried to get a paged result and the total number of items with:

query.setOffset(0);
query.setLimit(10);

HippoBeanIterator hits = queryResult.getHippoBeans();        
hits.getSize();

The result contained indeed 10 items but hits.getSize(); also returned 10. What's going wrong? Ard Schrijvers explained:

If you use setLimit(3), you will get at most 3 hits, but never more. getSize() from the queryResult returns at most 3. Even if the search criteria matched hundreds of documents. If you use offset(10) and no limit, getSize() returns just 10 hits less then without the offset.

This limit is there to be used for performance, and suits for example very well "show last 3 agenda items on homepage".

If you need paging and only want 10 results, do not use setLimit(int limit).

What you do use, is just the query without setLimit(). Then you'll get back a queryResult, from which you get a HippoBeanIterator. This is a normal Iterator, with some extensions. A very important one is the method skip(int skipNum).

From the HippoBeanIterator, you can simply iterate the beans you need. Make sure you use skip(int skipNum) to jump to the correct place. If the current page is 11 and pagesize is 20, set skipNum to 220.

Then fill your List in a for loop from skipNum - skipNum + pageSize.

What about performance?

The skip is propagated to the JCR NodeIterator. If you want to display item 100-110, and you use skip(100), still only 10 Beans will be created. JCR nodes for Beans 0-99 are not fetched.

What if you need the total number of hits?

Use getSize(). The getSize() on the the HstQueryResultImpl or on the HippoBeanIteratorImpl does not actually populate the entire iterator with HippoBeans. It is a call through the JCR NodeIterator, which in JackRabbit is some lazy loading iterator, and, where the getSize is propagated to the executed query, without fetching actual nodes.