<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5836842512400214870</id><updated>2011-07-31T20:40:26.608-07:00</updated><category term='File'/><category term='Basics'/><category term='Java'/><category term='Search Engines'/><category term='IO'/><title type='text'>Programming made easy</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://javamilestone.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://javamilestone.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Aarya</name><uri>http://www.blogger.com/profile/15381592351708404781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>8</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5836842512400214870.post-1691563815953127060</id><published>2008-08-30T03:12:00.000-07:00</published><updated>2008-08-30T07:35:19.038-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Search Engines'/><title type='text'>Configuring Apache Nutch</title><content type='html'>&lt;strong&gt;&lt;span style="font-family:courier new;font-size:180%;color:#cc0000;"&gt;Configuring Apache Nutch to search local files and online files. &lt;/span&gt;&lt;/strong&gt;&lt;br /&gt;&lt;strong&gt;&lt;span style="font-family:courier new;font-size:85%;color:#cc0000;"&gt;(please add comments if you found this post useful)&lt;br /&gt;&lt;/span&gt;&lt;/strong&gt;&lt;br /&gt;&lt;a name="_MailAutoSig"&gt;&lt;span style="color:#000000;"&gt;Nutch is an opensource search tool that we can configure to our need so that we can use it as a search tool for our internal documents. We can also use it to search the web.&lt;/span&gt;&lt;/a&gt;&lt;span style="color:#000000;"&gt;&lt;br /&gt;For a full documentation visit:&lt;/span&gt; &lt;a href="http://lucene.apache.org/nutch/"&gt;http://lucene.apache.org/nutch/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Below is the list of softwares and configurations that are needed to run nutch:&lt;/p&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Softwares&lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="color:#000000;"&gt;Nutch 0.9: &lt;/span&gt;&lt;a href="http://www.apache.org/dyn/closer.cgi/lucene/nutch/"&gt;&lt;span style="color:#3333ff;"&gt;http://www.apache.org/dyn/closer.cgi/lucene/nutch/&lt;/span&gt;&lt;/a&gt;&lt;span style="color:#000000;"&gt; (If things doesn’t work at some stage and you suspect that it is due to the problem in the source, try to download from their svn: &lt;/span&gt;&lt;a href="http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.9/"&gt;&lt;span style="color:#3333ff;"&gt;http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.9/&lt;/span&gt;&lt;/a&gt;&lt;span style="color:#000000;"&gt; )&lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;&lt;strong&gt;&lt;em&gt;&lt;li&gt;&lt;span style="color:#000000;"&gt;JAVA JDK 6 : &lt;/span&gt;&lt;a href="http://java.sun.com/javase/downloads/index.jsp"&gt;&lt;span style="color:#3333ff;"&gt;http://java.sun.com/javase/downloads/index.jsp&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color:#000000;"&gt;Apache Tomcat web server 6: &lt;/span&gt;&lt;a href="http://tomcat.apache.org/download-60.cgi"&gt;&lt;span style="color:#3333ff;"&gt;http://tomcat.apache.org/download-60.cgi&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color:#000000;"&gt;Cygwin: &lt;/span&gt;&lt;a href="http://www.cygwin.com/"&gt;&lt;span style="color:#3333ff;"&gt;http://www.cygwin.com/&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color:#000000;"&gt;Apache ant &lt;/span&gt;&lt;a href="http://ant.apache.org/bindownload.cgi"&gt;&lt;span style="color:#3333ff;"&gt;http://ant.apache.org/bindownload.cgi&lt;/span&gt;&lt;/a&gt;&lt;span style="color:#000000;"&gt; &lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/em&gt;&lt;/strong&gt;&lt;strong&gt;&lt;em&gt;&lt;p&gt;&lt;br /&gt;&lt;span style="color:#000000;"&gt;You can install these softwares to any directory you want. For the sake of simplicity i have mentioned the directories I used.&lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p align="left"&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Step 1: Install Nutch&lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p align="left"&gt;Unzip nutch to C:\nutch-0.9. If you are getting the source from svn, checkout to C:\nutch-0.9&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Step 2: Install Java&lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Install Java to, C:\program files\&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Step 3: Install Apache Tomcat&lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Install tomcat and run it. Make sure that it is running at &lt;a href="http://localhost:8080/"&gt;http://localhost:8080/&lt;/a&gt; or some other port if you have a custom port number.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Step 4: Install Cygwin&lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;This is to get a linux like environment to run the commands.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Step 5: Set JAVA_HOME and add update classpath&lt;br /&gt;&lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Set JAVA_HOME environment variable [Eg: C:\Program Files\Java\jdk1.6.0_05 ].&lt;br /&gt;Add %JAVA_HOME %\bin to classpath &lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Step 6: Install Apache ant &lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Install Apache ant eg: C:\apache-ant-1.7.0\&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Step 7: Add ant to the classpath:&lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Add ant to classpath[eg: C:\apache-ant-1.7.0\bin]&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Step 8: Build the project&lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;From command prompt:&lt;br /&gt;cd C:\nutch-0.9&lt;br /&gt;ant&lt;br /&gt;ant war&lt;/span&gt;&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Step 9: Testing Current Environment &lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Open the cygwin console and:&lt;br /&gt;cd c:/&lt;br /&gt;cd nutch-0.9/bin/&lt;br /&gt;./nutch &lt;/span&gt;&lt;br /&gt;&lt;strong&gt;THE OUTPUT WILL BE:&lt;br /&gt;&lt;/strong&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Usage: nutch COMMAND&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;where COMMAND is one of:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;crawl one-step crawler for intranets&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;readdb read / dump crawl db&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;convdb convert crawl db from pre-0.9 format&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;mergedb merge crawldb-s, with optional filtering&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;readlinkdb read / dump link db&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;inject inject new urls into the database&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;generate generate new segments to fetch from crawl db&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;freegen generate new segments to fetch from text files&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;fetch fetch a segment's pages&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;fetch2 fetch a segment's pages using Fetcher2 implementation&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;parse parse a segment's pages&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;readseg read / dump segment data&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;mergesegs merge several segments, with optional filtering and slicing&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;updatedb update crawl db from segments after fetching&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;invertlinks create a linkdb from parsed segments&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;mergelinkdb merge linkdb-s, with optional filtering&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;index run the indexer on parsed segments and linkdb&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;merge merge several segment indexes&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;dedup remove duplicates from a set of segment indexes&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;plugin load a plugin and run one of its classes main()&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server run a search server&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;or&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;CLASSNAME run the class named CLASSNAME&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Most commands print help when invoked w/o parameters.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;&lt;br /&gt;Step 10: Create the urls directory &lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Now create a directory called ‘urls’ inside, C:\nutch-0.9&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Step 11: Create a file for the crawler to find the url &lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Create a file with any name to include the urls to crawl. I have created a file named source.txt Enter the sites which are to be crawled.&lt;br /&gt;For eg:&lt;br /&gt;&lt;strong&gt;file:///c:/MySearch/samplefiles/&lt;br /&gt;http://www.apache.org/&lt;/strong&gt;&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;&lt;br /&gt;Step 12: Edit conf/crawl-urlfilter.txt &lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;# skip file:, ftp:, &amp;amp; mailto: urls&lt;br /&gt;-^(ftpmailto):&lt;br /&gt;# skip image and other suffixes we can't yet parse&lt;br /&gt;-\.(gifGIFjpgJPGpngPNGicoICOcsssitepswmfzippptmpgxlsgzrpmtgzmovMOVexejpegJPEGbmpBMP)$&lt;br /&gt;&lt;br /&gt;# skip URLs containing certain characters as probable queries, etc.&lt;br /&gt;-[?*!@=]&lt;br /&gt;&lt;br /&gt;# skip URLs with slash-delimited segment that repeats 3+ times, to break loops&lt;br /&gt;-.*(/[^/]+)/[^/]+\1/[^/]+\1/&lt;br /&gt;&lt;br /&gt;# accept hosts in MY.DOMAIN.NAME&lt;br /&gt;+^file://c:/MySearch/samplefiles/*&lt;br /&gt;+^http://([a-z0-9]*\.)*apache.org/&lt;br /&gt;&lt;br /&gt;# Accept everything else&lt;br /&gt;+.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Step 13: Edit conf/nutch-site.xml &lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Make sure to edit atleast the following entries:&lt;br /&gt;searcher.dir--&gt; this is the directory where we are going to make nutch’s database. All the indexing will be done in this folder.&lt;br /&gt;plugin.includes--&gt; Requires plugins&lt;br /&gt;file.content.limit--&gt; Set it to -1&lt;br /&gt;http.agent.name--&gt; Give your search agent a name&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Eg:&lt;br /&gt;&lt;p style="MARGIN-LEFT: 15pt"&gt;&amp;lt;property&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 30pt"&gt;&amp;lt;name&amp;gt;searcher.dir&amp;lt;/name&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 30pt"&gt;&amp;lt;value&amp;gt;C:\nutch-0.9\crawl&amp;lt;/value&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 15pt"&gt;&amp;lt;/property&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 15pt"&gt;&amp;lt;property&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 30pt"&gt;&amp;lt;name&amp;gt;plugin.includes&amp;lt;/name&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 30pt"&gt;&amp;lt;value&amp;gt;protocol-fileprotocol-httpclientprotocol-httpurlfilter-regexparse-texthtmljsmswordpdf)index-basicquery-basicsiteurl)summary-basicscoring-opicurlnormalizer-&amp;lt;/valuepassregexbasic)/value&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 15pt"&gt;&amp;lt;/property&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 15pt"&gt;&amp;lt;property&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 30pt"&gt;&amp;lt;name&amp;gt;file.content.limit&amp;lt;/name&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 30pt"&gt;&amp;lt;value&amp;gt;-1&amp;lt;/value&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 15pt"&gt;&amp;lt;/property&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 15pt"&gt;&amp;lt;property&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 30pt"&gt;&amp;lt;name&amp;gt;http.agent.name&amp;lt;/name&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 30pt"&gt;&amp;lt;value&amp;gt;MySearch&amp;lt;/value&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 30pt"&gt;&amp;lt;description&amp;gt;My Search Engine &amp;lt;/description&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 15pt"&gt;&amp;lt;/property&amp;gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Step 14: Run the crawl command &lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Once all the above steps are done, now its time to run the crawler.&lt;br /&gt;The most common options include to crawl command include&lt;br /&gt;-dir dir names the directory to put the crawl in.&lt;br /&gt;-threads threads determines the number of threads that will fetch in parallel.&lt;br /&gt;-depth depth indicates the link depth from the root page that should be crawled.&lt;br /&gt;-topN N determines the maximum number of pages that will be retrieved at each level up to the depth.&lt;br /&gt;&lt;br /&gt;You can Now run the crawl command from the cygwin console.&lt;br /&gt;Open cygwin:&lt;br /&gt;cd c:/&lt;br /&gt;cd nutch/bin&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;./nutch crawl urls –dir crawl –depth 3 –topN 50&lt;/strong&gt;&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;&lt;br /&gt;Step 15: Copy the war file to Tomcat &lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Copy nutch-0.9.war from C:\nutch-0.9\build to tomcat’s webapps directory.&lt;br /&gt;Restart tomcat.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;Step 16: Configure nutch-site.xml &lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Open C:\ tomcat\webapps\nutch-0.9\WEB-INF\classes\nutch-site.xml and make sure that searcher.dir is pointing to the crawl directory (The directory you mentioned in the ./nutch command)&lt;br /&gt;&lt;p style="MARGIN-LEFT: 15pt"&gt;&amp;lt;property&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 30pt"&gt;&amp;lt;name&amp;gt;searcher.dir&amp;lt;/name&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 30pt"&gt;&amp;lt;value&amp;gt;C:\nutch-0.9\crawl&amp;lt;/value&amp;gt;&lt;/p&gt;&lt;p style="MARGIN-LEFT: 15pt"&gt;&amp;lt;/property&amp;gt;&lt;/p&gt;Restart tomcat.&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;span style="font-size:130%;color:#009900;"&gt;&lt;br /&gt;Step 17: Access Nutch search &lt;/span&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Open browser and access &lt;a href="http://localhost:8080/nutch-0.9"&gt;http://localhost:8080/nutch-0.9&lt;/a&gt;&lt;br /&gt;Enter your string to search.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt;&lt;strong&gt;&lt;span style="color:#660000;font-size:180%;"&gt;(please add comments if you found this post useful)&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5836842512400214870-1691563815953127060?l=javamilestone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://javamilestone.blogspot.com/feeds/1691563815953127060/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5836842512400214870&amp;postID=1691563815953127060' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/1691563815953127060'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/1691563815953127060'/><link rel='alternate' type='text/html' href='http://javamilestone.blogspot.com/2008/08/configuring-apache-nutch-to-search.html' title='Configuring Apache Nutch'/><author><name>Aarya</name><uri>http://www.blogger.com/profile/15381592351708404781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5836842512400214870.post-4158221424148463445</id><published>2008-08-12T07:15:00.000-07:00</published><updated>2008-08-12T07:16:19.349-07:00</updated><title type='text'>Convert Staxsource to StreamSource</title><content type='html'>&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;import java.io.File;&lt;br /&gt;import java.io.FileInputStream;&lt;br /&gt;import java.io.StringReader;&lt;br /&gt;import javax.xml.transform.Source;&lt;br /&gt;import javax.xml.transform.Transformer;&lt;br /&gt;import javax.xml.transform.TransformerFactory;&lt;br /&gt;import javax.xml.transform.stream.StreamResult;&lt;br /&gt;import javax.xml.transform.stream.StreamSource;&lt;br /&gt;&lt;br /&gt;public class SourceConvertor&lt;br /&gt;{&lt;br /&gt;    private static Source convertStaxToStream(Source request)&lt;br /&gt;    {&lt;br /&gt;        TransformerFactory factory = TransformerFactory.newInstance();&lt;br /&gt;        Transformer transformer = null;&lt;br /&gt;        File fp = null;&lt;br /&gt;        FileInputStream fInp = null;&lt;br /&gt;        try&lt;br /&gt;        {&lt;br /&gt;            transformer = factory.newTransformer();&lt;br /&gt;            fp = new File("tempFile.txt");&lt;br /&gt;            transformer.transform(request, new StreamResult(fp));&lt;br /&gt;            fInp = new FileInputStream(fp);&lt;br /&gt;        } catch (Exception e)&lt;br /&gt;        {&lt;br /&gt;            e.printStackTrace();&lt;br /&gt;        }&lt;br /&gt;        return new StreamSource(fInp);&lt;br /&gt;    }&lt;br /&gt;    public static void main(String args[])&lt;br /&gt;    {&lt;br /&gt;        try&lt;br /&gt;        {&lt;br /&gt;            String message ="&lt;author&gt;&lt;name&gt;Rai&lt;/name&gt;&lt;book&gt;GodOfSmallThings&lt;/book&gt;&lt;/author&gt;";&lt;br /&gt;            Source original = new StreamSource(new StringReader(message));&lt;br /&gt;            Source converted = convertStaxToStream(original);&lt;br /&gt;           &lt;br /&gt;            TransformerFactory factory = TransformerFactory.newInstance();&lt;br /&gt;            Transformer transformer = factory.newTransformer();&lt;br /&gt;            transformer.transform(converted, new StreamResult(System.out));&lt;br /&gt;        }&lt;br /&gt;        catch (Exception e)&lt;br /&gt;        {&lt;br /&gt;            // TODO Auto-generated catch block&lt;br /&gt;            e.printStackTrace();&lt;br /&gt;        }&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5836842512400214870-4158221424148463445?l=javamilestone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://javamilestone.blogspot.com/feeds/4158221424148463445/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5836842512400214870&amp;postID=4158221424148463445' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/4158221424148463445'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/4158221424148463445'/><link rel='alternate' type='text/html' href='http://javamilestone.blogspot.com/2008/08/convert-staxsource-to-streamsource.html' title='Convert Staxsource to StreamSource'/><author><name>Aarya</name><uri>http://www.blogger.com/profile/15381592351708404781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5836842512400214870.post-7410175346952569596</id><published>2008-08-02T11:18:00.001-07:00</published><updated>2008-08-02T11:20:58.615-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Basics'/><category scheme='http://www.blogger.com/atom/ns#' term='File'/><title type='text'>Program to write to a file</title><content type='html'>&lt;pre&gt;&lt;br /&gt;package com.milestone.snippets;&lt;br /&gt;&lt;br /&gt;import java.io.BufferedWriter;&lt;br /&gt;import java.io.FileWriter;&lt;br /&gt;import java.io.IOException;&lt;br /&gt;&lt;br /&gt;/**&lt;br /&gt; * @author AARYA&lt;br /&gt; * &lt;br /&gt; */&lt;br /&gt;public class CodeSnippetTester&lt;br /&gt;{&lt;br /&gt; /**&lt;br /&gt;  * @param args&lt;br /&gt;  */&lt;br /&gt; public static void main(String[] args)&lt;br /&gt; {&lt;br /&gt;  try&lt;br /&gt;  {&lt;br /&gt;   BufferedWriter out = new BufferedWriter(new FileWriter("addressbook.txt"));&lt;br /&gt;   out.write("contact1");&lt;br /&gt;   out.close();&lt;br /&gt;  } catch (IOException e)&lt;br /&gt;  {&lt;br /&gt;   e.printStackTrace();&lt;br /&gt;  }&lt;br /&gt; }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5836842512400214870-7410175346952569596?l=javamilestone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://javamilestone.blogspot.com/feeds/7410175346952569596/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5836842512400214870&amp;postID=7410175346952569596' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/7410175346952569596'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/7410175346952569596'/><link rel='alternate' type='text/html' href='http://javamilestone.blogspot.com/2008/08/program-to-write-to-file.html' title='Program to write to a file'/><author><name>Aarya</name><uri>http://www.blogger.com/profile/15381592351708404781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5836842512400214870.post-8031891690205734653</id><published>2008-08-02T11:11:00.000-07:00</published><updated>2008-08-02T11:16:18.723-07:00</updated><title type='text'>Program to read text from a file</title><content type='html'>&lt;pre&gt;&lt;br /&gt;package com.milestone.snippets;&lt;br /&gt;&lt;br /&gt;import java.io.BufferedReader;&lt;br /&gt;import java.io.FileReader;&lt;br /&gt;import java.io.IOException;&lt;br /&gt;&lt;br /&gt;/**&lt;br /&gt; * @author AARYA&lt;br /&gt; * &lt;br /&gt; */&lt;br /&gt;public class CodeSnippetTester&lt;br /&gt;{&lt;br /&gt; /**&lt;br /&gt;  * @param args&lt;br /&gt;  */&lt;br /&gt; public static void main(String[] args)&lt;br /&gt; {&lt;br /&gt;  String str = " ";&lt;br /&gt;  try&lt;br /&gt;  {&lt;br /&gt;   BufferedReader in = new BufferedReader(new FileReader(&lt;br /&gt;     "addressbook.txt"));&lt;br /&gt;   while ((str = in.readLine()) != null)&lt;br /&gt;   {&lt;br /&gt;   }&lt;br /&gt;   in.close();&lt;br /&gt;  } catch (IOException e)&lt;br /&gt;  {&lt;br /&gt;   e.printStackTrace();&lt;br /&gt;  }&lt;br /&gt;  System.out.println("Output : " + str);&lt;br /&gt; }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5836842512400214870-8031891690205734653?l=javamilestone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://javamilestone.blogspot.com/feeds/8031891690205734653/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5836842512400214870&amp;postID=8031891690205734653' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/8031891690205734653'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/8031891690205734653'/><link rel='alternate' type='text/html' href='http://javamilestone.blogspot.com/2008/08/program-to-read-text-from-file.html' title='Program to read text from a file'/><author><name>Aarya</name><uri>http://www.blogger.com/profile/15381592351708404781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5836842512400214870.post-4878605769969277685</id><published>2008-08-02T11:05:00.000-07:00</published><updated>2008-08-02T11:11:27.908-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Basics'/><category scheme='http://www.blogger.com/atom/ns#' term='IO'/><title type='text'>Program to read text from console</title><content type='html'>&lt;pre&gt;&lt;br /&gt;package com.milestone.snippets;&lt;br /&gt;&lt;br /&gt;import java.io.BufferedReader;&lt;br /&gt;import java.io.IOException;&lt;br /&gt;import java.io.InputStreamReader;&lt;br /&gt;&lt;br /&gt;/**&lt;br /&gt; * @author AARYA&lt;br /&gt; * &lt;br /&gt; */&lt;br /&gt;public class CodeSnippetTester&lt;br /&gt;{&lt;br /&gt; /**&lt;br /&gt;  * @param args&lt;br /&gt;  */&lt;br /&gt; public static void main(String[] args)&lt;br /&gt; {&lt;br /&gt;  String str = "";&lt;br /&gt;  try&lt;br /&gt;  {&lt;br /&gt;   BufferedReader in = new BufferedReader(new InputStreamReader(System.in));&lt;br /&gt;   str = in.readLine();&lt;br /&gt;  } catch (IOException e)&lt;br /&gt;  {&lt;br /&gt;   e.printStackTrace();&lt;br /&gt;  }&lt;br /&gt;  System.out.println("The string you just entered is" + str);&lt;br /&gt; }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5836842512400214870-4878605769969277685?l=javamilestone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://javamilestone.blogspot.com/feeds/4878605769969277685/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5836842512400214870&amp;postID=4878605769969277685' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/4878605769969277685'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/4878605769969277685'/><link rel='alternate' type='text/html' href='http://javamilestone.blogspot.com/2008/08/program-to-read-text-from-console.html' title='Program to read text from console'/><author><name>Aarya</name><uri>http://www.blogger.com/profile/15381592351708404781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5836842512400214870.post-5987817061082381284</id><published>2008-08-02T11:02:00.000-07:00</published><updated>2008-08-02T11:05:15.193-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Basics'/><category scheme='http://www.blogger.com/atom/ns#' term='File'/><title type='text'>Program to Delete a file</title><content type='html'>Java Code to delete a file:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;package com.milestone.snippets;&lt;br /&gt;&lt;br /&gt;import java.io.File;&lt;br /&gt;import java.io.IOException;&lt;br /&gt;&lt;br /&gt;/**&lt;br /&gt; * @author AARYA&lt;br /&gt; * &lt;br /&gt; */&lt;br /&gt;public class CodeSnippetTester&lt;br /&gt;{&lt;br /&gt; /**&lt;br /&gt;  * @param args&lt;br /&gt;  */&lt;br /&gt; public static void main(String[] args)&lt;br /&gt; {&lt;br /&gt;  boolean flag = (new File("addressbook.txt")).delete();&lt;br /&gt;  if (!flag)&lt;br /&gt;  {&lt;br /&gt;   System.out.println("Unable to delete the file");&lt;br /&gt;  }&lt;br /&gt; }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5836842512400214870-5987817061082381284?l=javamilestone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://javamilestone.blogspot.com/feeds/5987817061082381284/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5836842512400214870&amp;postID=5987817061082381284' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/5987817061082381284'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/5987817061082381284'/><link rel='alternate' type='text/html' href='http://javamilestone.blogspot.com/2008/08/program-to-delete-file.html' title='Program to Delete a file'/><author><name>Aarya</name><uri>http://www.blogger.com/profile/15381592351708404781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5836842512400214870.post-6503468038436033348</id><published>2008-08-02T11:01:00.000-07:00</published><updated>2008-08-02T11:02:10.520-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Basics'/><category scheme='http://www.blogger.com/atom/ns#' term='File'/><title type='text'>Program to create a new file</title><content type='html'>Source code to create a new file:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;package com.milestone.snippets;&lt;br /&gt;&lt;br /&gt;import java.io.File;&lt;br /&gt;import java.io.IOException;&lt;br /&gt;&lt;br /&gt;/**&lt;br /&gt; * @author AARYA&lt;br /&gt; * &lt;br /&gt; */&lt;br /&gt;public class CodeSnippetTester&lt;br /&gt;{&lt;br /&gt; /**&lt;br /&gt;  * @param args&lt;br /&gt;  */&lt;br /&gt; public static void main(String[] args)&lt;br /&gt; {&lt;br /&gt;  try&lt;br /&gt;  {&lt;br /&gt;   File file = new File("addressbook.txt");&lt;br /&gt;   boolean success = file.createNewFile();&lt;br /&gt;   if (success)&lt;br /&gt;   {&lt;br /&gt;    System.out.println("New File Created");&lt;br /&gt;   } else&lt;br /&gt;   {&lt;br /&gt;    System.out.println("File Already Exists");&lt;br /&gt;   }&lt;br /&gt;  } catch (IOException e)&lt;br /&gt;  {&lt;br /&gt;  }&lt;br /&gt; }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5836842512400214870-6503468038436033348?l=javamilestone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://javamilestone.blogspot.com/feeds/6503468038436033348/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5836842512400214870&amp;postID=6503468038436033348' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/6503468038436033348'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/6503468038436033348'/><link rel='alternate' type='text/html' href='http://javamilestone.blogspot.com/2008/08/program-to-create-new-file.html' title='Program to create a new file'/><author><name>Aarya</name><uri>http://www.blogger.com/profile/15381592351708404781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5836842512400214870.post-1439993064943059334</id><published>2008-08-02T10:36:00.000-07:00</published><updated>2008-08-02T10:56:05.266-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Basics'/><category scheme='http://www.blogger.com/atom/ns#' term='File'/><title type='text'>Program to check if a file exists or not</title><content type='html'>It would be always nice to check that if a file exists or not if you are going to take a file as input. Find below a code snippet for this:&lt;br /&gt;&lt;font color = Blue&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;package com.milestone.snippets;&lt;br /&gt;&lt;br /&gt;import java.io.File;&lt;br /&gt;&lt;br /&gt;/**&lt;br /&gt; * @author AARYA&lt;br /&gt; * &lt;br /&gt; */&lt;br /&gt;public class FileExistaOrNot&lt;br /&gt;{&lt;br /&gt; /**&lt;br /&gt;  * @param args&lt;br /&gt;  */&lt;br /&gt; public static void main(String[] args)&lt;br /&gt; {&lt;br /&gt;  boolean exists = (new File("inputfile.txt")).exists();&lt;br /&gt;  if (exists)&lt;br /&gt;  {&lt;br /&gt;   System.out.println("The File Exists");&lt;br /&gt;  } else&lt;br /&gt;  {&lt;br /&gt;   System.out.println("The File Doesnt Exist");&lt;br /&gt;  }&lt;br /&gt; }&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/font&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5836842512400214870-1439993064943059334?l=javamilestone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://javamilestone.blogspot.com/feeds/1439993064943059334/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5836842512400214870&amp;postID=1439993064943059334' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/1439993064943059334'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5836842512400214870/posts/default/1439993064943059334'/><link rel='alternate' type='text/html' href='http://javamilestone.blogspot.com/2008/08/program-to-check-if-file-exists-or-not.html' title='Program to check if a file exists or not'/><author><name>Aarya</name><uri>http://www.blogger.com/profile/15381592351708404781</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
