<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:gml="http://www.opengis.net/gml"
>

<channel>
	<title>LiquidFoot &#187; lucene</title>
	<atom:link href="http://www.liquidfoot.com/tag/lucene/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.liquidfoot.com</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Sat, 17 Apr 2010 16:36:10 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
		<item>
		<title>Luke for Lucene 2.4</title>
		<link>http://www.liquidfoot.com/2008/08/21/luke-for-lucene-24/</link>
		<comments>http://www.liquidfoot.com/2008/08/21/luke-for-lucene-24/#comments</comments>
		<pubDate>Thu, 21 Aug 2008 19:13:37 +0000</pubDate>
		<dc:creator>Wayne</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[luke]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://www.liquidfoot.com/?p=34</guid>
		<description><![CDATA[]]></description>
			<content:encoded><![CDATA[<p>On the Vufind project, I&#8217;ve been migrating the Solrmarc codebase to use Solr 1.3. It&#8217;s got a lot of nifty changes (the multi-core and DataImportHandler are really nice additions), including moving to Lucene 2.4 for the actual indexing.</p>
<p>On the down side, this means that <a href="http://www.getopt.org/luke/">Luke </a>doesn&#8217;t have the correct analyzers or format interpretors (you get an incorrect format excpetion).</p>
<p>The fix is pretty quick. Just download the source tarball/zip for Luke (at the bottom of the page) and extract them somewhere. Then, grab a build of the Lucene 2.4 core, analyzers, and snowball analyzer (I grabbed mine from the <a href="http://people.apache.org/builds/lucene/solr/nightly/">Solr nightly build</a>).</p>
<p>Just throw the jars into Luke&#8217;s lib folder (e.g. ~/luke-src-0.8-dev/lib) and then rebuild with Ant (ant). This will build the jars in the dist folder. You should be able to check out your index to make sure everything is good in the index.</p>
<p>If you&#8217;re lazy and just want the one I created, here you go: <a href="http://www.liquidfoot.com/wp-content/uploads/2008/08/luke-08-dev.jar">luke-08-dev</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.liquidfoot.com/2008/08/21/luke-for-lucene-24/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Coldfusion Solr Client &#8211; SolColdfusion</title>
		<link>http://www.liquidfoot.com/2007/10/04/coldfusion-solr-client-solcoldfusion/</link>
		<comments>http://www.liquidfoot.com/2007/10/04/coldfusion-solr-client-solcoldfusion/#comments</comments>
		<pubDate>Thu, 04 Oct 2007 22:19:47 +0000</pubDate>
		<dc:creator>Wayne</dc:creator>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[coldfusion]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://www.liquidfoot.com/?p=182</guid>
		<description><![CDATA[As I hinted at yesterday, I was close to having some code in the pipeline to abstract using Solr. I&#8217;ve finished the initial code with the following built in. Here&#8217;s a brief setup guide to start playing with the code. First, you&#8217;re going to need to grab the latest release version of Solr (currently 1.2). [...]]]></description>
			<content:encoded><![CDATA[<div class="body">
<p>As I hinted at yesterday, I was close to having some code in the pipeline to abstract using Solr. I&#8217;ve finished the initial code with the following built in. Here&#8217;s a brief setup guide to start playing with the code.</p>
<p>First, you&#8217;re going to need to grab the latest <a href="http://www.apache.org/dyn/closer.cgi/lucene/solr/">release version of Solr (currently 1.2)</a>. The only real requirement to run this software is that you have a JRE of 1.5 or higher. Untar/zip the file somewhere convenient and open a command prompt. Get to the example directory in the apache-solr.1.2.x folder (cd /example). To start up the sample server running Jetty, just issue the following command:</p>
<div class="code">java -jar start.jar</div>
<p>This will start a new instance of the Solr server on your computer on port 8983. You can make sure this is running by navigating to <a href="http://localhost:8983/solr">http://localhost:8983/solr</a> (NOTE: this is a link to your computer. If you get an error, it&#8217;s because your computer isn&#8217;t running an instance of Solr on port 8983).</p>
<p>At this point, it&#8217;s probably good to send you over to the Solr website to take a look at <a href="http://lucene.apache.org/solr/tutorial.html">their tutorial</a>. Go ahead. I&#8217;ll wait&#8230;</p>
<p>&#8230;</p>
<p>Great, you&#8217;re back.</p>
<p>You&#8217;ve seen some basic inserting, deleting, and querying of Solr index data. You may have also noticed that there are clients for PHP, Ruby, Python, and Java&#8230;no ColdFusion. I want to do a little more testing on this before I submit the patch, but I&#8217;ve added the initial code as an encosure here to do updating, deleting, and searching in Coldfusion.</p>
<p>The CFC SolColdfusion should be in the path org/apache/client (at least that&#8217;s where I&#8217;m putting in for the purposes of this initial demonstration). The initialization takes one required parameter (the Solr host) and then has two optional parameters (port and path).</p>
<p>To set this up, create an instance with</p>
<div class="code"><span style="color: #800000;">&lt;cfset solr = createObject(<span style="color: #0000ff;">&#8220;component&#8221;</span>, <span style="color: #0000ff;">&#8220;org.apache.solr.client.SolColdfusion&#8221;</span>).init(<span style="color: #0000ff;">&#8220;<a href="http://localhost/" target="_blank">http://localhost</a>&#8220;</span>, <span style="color: #0000ff;">&#8220;8983&#8243;</span>, <span style="color: #0000ff;">&#8220;/solr&#8221;</span>) /&gt;</span></div>
<p>Now, there are a lot of different parameters you can send to Solr to perform different queries. And, since some of these key names can repeat, I chose to implement sending these parameters as an array. So, let&#8217;s set this up.</p>
<div class="code"><span style="color: #800000;">&lt;cfset params = arrayNew(<span style="color: #0000ff;">1</span>) /&gt;</span></p>
<p><span style="color: #800000;">&lt;cfset params[1][1] = <span style="color: #0000ff;">&#8220;indent&#8221;</span>&gt;</span><br />
<span style="color: #800000;">&lt;cfset params[1][2] = <span style="color: #0000ff;">&#8220;on&#8221;</span> /&gt;</span><br />
<span style="color: #800000;">&lt;cfset params[2][1] = <span style="color: #0000ff;">&#8220;wt&#8221;</span>&gt;</span><br />
<span style="color: #800000;">&lt;cfset params[2][2] = <span style="color: #0000ff;">&#8220;standard&#8221;</span> /&gt;</span><br />
<span style="color: #800000;">&lt;cfset params[3][1] = <span style="color: #0000ff;">&#8220;fl&#8221;</span> /&gt;</span><br />
<span style="color: #800000;">&lt;cfset params[3][2] = <span style="color: #0000ff;">&#8220;*,score&#8221;</span> /&gt;</span><br />
<span style="color: #800000;">&lt;cfset params[4][1] = <span style="color: #0000ff;">&#8220;qt&#8221;</span> /&gt;</span><br />
<span style="color: #800000;">&lt;cfset params[4][2] = <span style="color: #0000ff;">&#8220;standard&#8221;</span> /&gt;</span><br />
<span style="color: #800000;">&lt;cfset params[5][1] = <span style="color: #0000ff;">&#8220;wt&#8221;</span> /&gt;</span><br />
<span style="color: #800000;">&lt;cfset params[5][2] = <span style="color: #0000ff;">&#8220;standard&#8221;</span> /&gt;</span></div>
<p>These parameters are basically what are the defaults that Solr will return back to you. If you want highlighting, you would need to add two additional row vectors with &#8216;hl = on&#8217; and &#8216;hl.fl = &#8216;.</p>
<p>Searching is straight forward, taking a query, the start row, number of rows to return, and the array of parameters:</p>
<div class="code"><span style="color: #800000;">&lt;cfset results = solr.search(<span style="color: #0000ff;">&#8220;*:*&#8221;</span>,<span style="color: #0000ff;"> 0</span>,<span style="color: #0000ff;"> 10</span>, params) /&gt;</span></div>
<p>This searches all fields and all content and returns back an XML document with the search results in it.</p>
<div class="code"><span style="color: #800000;">&lt;cfdump var=<span style="color: #0000ff;">&#8220;#results#&#8221;</span> /&gt;</span></div>
<p>In the result node, you&#8217;ll see that Solr returns an xmlAttribute of</p>
<div class="code">numFound</div>
<p>of 0 (assuming you don&#8217;t have anything in the index). Let&#8217;s add an example document from the documents that come with Solr.</p>
<div class="code"><span style="color: #808080;"><em>&lt;!&#8212; Create a new sample document &#8212;&gt;</em></span><br />
<span style="color: #800000;">&lt;cfxml variable=<span style="color: #0000ff;">&#8220;sample&#8221;</span>&gt;</span><br />
<span style="color: #000080;">&lt;doc&gt;</span><br />
<span style="color: #ff8000;">&lt;field name=<span style="color: #0000ff;">&#8220;id&#8221;</span>&gt;</span>F8V7067-APL-KIT<span style="color: #ff8000;">&lt;/field&gt;</span><br />
<span style="color: #ff8000;">&lt;field name=<span style="color: #0000ff;">&#8220;name&#8221;</span>&gt;</span>Belkin Mobile Power Cord for iPod w/ Dock<span style="color: #ff8000;">&lt;/field&gt;</span><br />
<span style="color: #ff8000;">&lt;field name=<span style="color: #0000ff;">&#8220;manu&#8221;</span>&gt;</span>Belkin<span style="color: #ff8000;">&lt;/field&gt;</span><br />
<span style="color: #ff8000;">&lt;field name=<span style="color: #0000ff;">&#8220;cat&#8221;</span>&gt;</span>electronics<span style="color: #ff8000;">&lt;/field&gt;</span><br />
<span style="color: #ff8000;">&lt;field name=<span style="color: #0000ff;">&#8220;cat&#8221;</span>&gt;</span>connector<span style="color: #ff8000;">&lt;/field&gt;</span><br />
<span style="color: #ff8000;">&lt;field name=<span style="color: #0000ff;">&#8220;features&#8221;</span>&gt;</span>car power adapter, white<span style="color: #ff8000;">&lt;/field&gt;</span><br />
<span style="color: #ff8000;">&lt;field name=<span style="color: #0000ff;">&#8220;weight&#8221;</span>&gt;</span>4<span style="color: #ff8000;">&lt;/field&gt;</span><br />
<span style="color: #ff8000;">&lt;field name=<span style="color: #0000ff;">&#8220;price&#8221;</span>&gt;</span>19.95<span style="color: #ff8000;">&lt;/field&gt;</span><br />
<span style="color: #ff8000;">&lt;field name=<span style="color: #0000ff;">&#8220;popularity&#8221;</span>&gt;</span>1<span style="color: #ff8000;">&lt;/field&gt;</span><br />
<span style="color: #ff8000;">&lt;field name=<span style="color: #0000ff;">&#8220;inStock&#8221;</span>&gt;</span>false<span style="color: #ff8000;">&lt;/field&gt;</span><br />
<span style="color: #000080;">&lt;/doc&gt;</span><br />
<span style="color: #800000;">&lt;/cfxml&gt;</span></p>
<p><span style="color: #808080;"><em>&lt;!&#8212; add this document to the index &#8212;&gt;</em></span><br />
<span style="color: #800000;">&lt;cfset solr.add(sample) /&gt;</span><br />
<span style="color: #800000;">&lt;cfset solr.commit() /&gt;</span><br />
<span style="color: #800000;">&lt;cfset solr.optimize() /&gt;</span></p>
<p><span style="color: #808080;"><em>&lt;!&#8212; search for the newly added document &#8212;&gt;</em></span><br />
<span style="color: #800000;">&lt;cfset results = solr.search(<span style="color: #0000ff;">&#8220;id:F8V7067-APL-KIT&#8221;</span>,<span style="color: #0000ff;"> 0</span>,<span style="color: #0000ff;"> 10</span>, params) /&gt;</span></p>
<p><span style="color: #800000;">&lt;cfdump var=<span style="color: #0000ff;">&#8220;#xmlParse(results)#&#8221;</span> /&gt;</span></div>
<p>You&#8217;ll notice I used a commit and optmize statement. Neither of these statements are necessary every time you add a document, but be aware that Solr caches documents and won&#8217;t flush the new documents to disk unless you either commit the documents or the mergefactor setting you used in your solrconfig.xml file has been reached.</p>
<p>Now, let&#8217;s delete this document&#8230;</p>
<div class="code"><span style="color: #800000;">&lt;cfset solr.deleteById(<span style="color: #0000ff;">&#8220;F8V7067-APL-KIT&#8221;</span>) /&gt;</span><br />
<span style="color: #800000;">&lt;cfset solr.commit() /&gt;</span></div>
<p>Don&#8217;t forget to commit deletions to the index!</p>
<p>There&#8217;ll be more soon (add multiple documents, delete by queries). In the mean time, try it out. If you have any comments, questions, concerns, whatever, let me know.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.liquidfoot.com/2007/10/04/coldfusion-solr-client-solcoldfusion/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ColdFusion and Solr</title>
		<link>http://www.liquidfoot.com/2007/10/03/coldfusion-and-solr/</link>
		<comments>http://www.liquidfoot.com/2007/10/03/coldfusion-and-solr/#comments</comments>
		<pubDate>Wed, 03 Oct 2007 22:38:43 +0000</pubDate>
		<dc:creator>Wayne</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[coldfusion]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://www.liquidfoot.com/?p=184</guid>
		<description><![CDATA[I&#8217;ve spent the last few months working on some projects that didn&#8217;t really have anything to do with ColdFusion (lots of Java and PHP). One of the projects I&#8217;ve been working with (Vufind.org) uses Solr as it&#8217;s indexing/search engine. That&#8217;s starting to get picked up by some pretty big companies (Netflix just relaunched their search [...]]]></description>
			<content:encoded><![CDATA[<div class="body">
<p>I&#8217;ve spent the last few months working on some projects that didn&#8217;t really have anything to do with ColdFusion (lots of Java and PHP). One of the projects I&#8217;ve been working with (<a href="http://www.vufind.org/">Vufind.org</a>) uses <a href="http://lucene.apache.org/solr/">Solr</a> as it&#8217;s indexing/search engine. That&#8217;s starting to get picked up by some pretty big companies (Netflix just relaunched their search using Solr this week).</p>
<p>I&#8217;ve been working with Solr in Java for a bit now, and I wanted to start to build an interface for using it as a search engine (my Lucene code is stuck in open source limbo) in Coldfusion. One of the cool things about Solr is that it returns results back through HTTP (in XML, JSON, or ruby).</p>
<p>As soon as I get the code finished, I&#8217;ll post it as a patch in Solr.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.liquidfoot.com/2007/10/03/coldfusion-and-solr/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fun with PDFs</title>
		<link>http://www.liquidfoot.com/2007/05/17/fun-with-pdfs/</link>
		<comments>http://www.liquidfoot.com/2007/05/17/fun-with-pdfs/#comments</comments>
		<pubDate>Fri, 18 May 2007 00:12:25 +0000</pubDate>
		<dc:creator>Wayne</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[hacks]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[pdf]]></category>

		<guid isPermaLink="false">http://www.liquidfoot.com/?p=192</guid>
		<description><![CDATA[I&#8217;ve been working with a lot of PDF files lately for a few different projects (see The FlatHat and Card Catalog). With our special collections cards, when you got a result back, Acrobat viewer would blow up the image to around 600%, making for a rather ugly image. For the FlatHat, I really wanted to [...]]]></description>
			<content:encoded><![CDATA[<div class="body">
<p>I&#8217;ve been working with a lot of PDF files lately for a few different projects (see <a href="http://swem.wm.edu/beta/flathat/">The FlatHat</a> and <a href="http://swem.wm.edu/beta/cards/">Card Catalog</a>). With our special collections cards, when you got a result back, Acrobat viewer would blow up the image to around 600%, making for a rather ugly image. For the FlatHat, I really wanted to be able to open a PDF and have the search terms highlighted, so I started hunting for ways to actually do this.</p>
<p>I&#8217;ve been using <a href="http://www.pdfbox.org/">PDFBox</a> to extract text from our PDFs to index with Lucene, so I started there and they clued me in to Adobe&#8217;s <a href="http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf">PDF Open Parameters</a>. This really killed a few birds with one stone.</p>
<p>When I was working on the Flat Hat newspaper, I was originally only returning back the page that the search result was on. I had some misgivings about this (like what if the story was on more than one page), but being able to pass the search query from the engine into the PDF is really nice since the user doesn&#8217;t have to search through the entire issue to find the the context they are searching for (e.g. <a href="http://swem.wm.edu/beta/flathat/issues/fh19440301.pdf#search=%22whistle%20bait%22&amp;zoom=125">whistle bait</a> &#8212; when I saw that term, I cracked up; definitely a different era).</p>
<p>Basically, the PDF Open Parameters allow you to pass commands into a PDF to allow you to control how the PDF is opened. They&#8217;re passed with a &#8220;#&#8221; after the filename (e.g. filename.pdf#zoom=100). You can string commands together with an ampersand (&amp;) with a few caveats:</p>
<ol>
<li>only one digit after a decimal is retained</li>
<li>parameters and their values can only be 32 total characters long</li>
<li>you can&#8217;t use reserved characters (=, #, and &amp;) to escape special characters</li>
<li>if you turn bookmarks off for a PDF that had bookmarks showing, they won&#8217;t go away until the PDF has been rendered</li>
</ol>
<p>Anyway, here are some examples of what you can do:</p>
<ul>
<li><a href="http://swem.wm.edu/beta/flathat/issues/fh19391031.pdf#page=3">http://swem.wm.edu/beta/flathat/issues/fh19391031.pdf#page=3</a></li>
<li><a href="http://swem.wm.edu/beta/flathat/issues/fh19391031.pdf#page=3&amp;zoom=150,250,100">http://swem.wm.edu/beta/flathat/issues/fh19391031.pdf#page=3&amp;zoom=150,250,100</a></li>
<li><a href="http://swem.wm.edu/beta/flathat/issues/fh19391031.pdf#pagemode=thumbs">http://swem.wm.edu/beta/flathat/issues/fh19391031.pdf#pagemode=thumbs</a></li>
<li><a href="http://swem.wm.edu/beta/flathat/issues/fh19391031.pdf#page=4&amp;view=fitH,100">http://swem.wm.edu/beta/flathat/issues/fh19391031.pdf#page=4&amp;view=fitH,100</a></li>
<li><a href="http://swem.wm.edu/beta/flathat/issues/fh19391031.pdf#pagemode=none">http://swem.wm.edu/beta/flathat/issues/fh19391031.pdf#pagemode=none</a></li>
</ul>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.liquidfoot.com/2007/05/17/fun-with-pdfs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
