Solr proxy

solrSolr is a great and powerful search engine based on Lucene. It’s not secure by default so setting up a proxy to secure the solr search engine is highly recommended, but not well documented.

This is my implementation.


First you secure your entire tomcat web admin gui by requiring authentication. First, create a new user in /etc/tomcat6/tomcat-users.xml

<tomcat-users>
  <role rolename="admin"/>
  <role rolename="manager"/>
  <role rolename="proxyUsers"/>
  <user username="jorno" password="XXXXX" roles="admin,manager,proxyUsers"/>
  <user username="proxyUser" password="XXXXXX" roles="proxyUsers"/>
</tomcat-users>

Add the security constraint in the file /etc/tomcat6/web.xml

<security-constraint>
    <web-resource-collection>
      <web-resource-name>
        Solr authenticated application
      </web-resource-name>

      <url-pattern>/*</url-pattern>
      <http-method>GET</http-method>
      <http-method>POST</http-method>
    </web-resource-collection>
    <auth-constraint>
      <role-name>solrUsers</role-name>
    </auth-constraint>
  </security-constraint>
   <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>Basic Authentication</realm-name>
  </login-config>
  <security-role>
    <description>solr users</description>
  </security-role>

Remember to restart tomcat

/etc/init.d/tomcat6 restart

Visit your tomcat solr application web gui and check that the authentication works.

Now we create the actual proxy in apache. First, create a simple site. PHP is required. The following example vhost will do the trick. Notice the rewrite.

# q.example.com - Solr search engine proxy

<VirtualHost *>
        ServerAdmin webmaster@localhost
        DocumentRoot /var/www/q.example.com
        ServerName q.example.com
        ServerAlias solr.example.com

        AddDefaultCharset UTF-8

        ErrorLog /var/log/apache2/error.log

        # Possible values include: debug, info, notice, warn, error, crit,
        # alert, emerg.
        LogLevel warn

        CustomLog /var/log/apache2/access.log combined
        ServerSignature On

        # REWRITES
        RewriteEngine on
        # Debug:
        # RewriteLog /tmp/rwlog
        # RewriteLogLevel 9

        RewriteRule ^/([^/]+)/?                 /index.php                    [NC,L]

</VirtualHost>

Remember to reload apache

/etc/init.d/apache2 reload

Now create an index.php file in the root of the site. Depending on how often you update your index you might want to let the user cache the contents of the proxy. I’ve set 5 minutes here.

<?php

// Avvoid too long client cache
// calc an offset of 5 minutes
$offset = 60 * 5;
// calc the string in GMT not localtime and add the offset
$expire = "Expires: " . gmdate("D, d M Y H:i:s", time() + $offset) . " GMT";
//output the HTTP header
Header($expire);

// Write the result from solr
Echo file_get_contents('http://proxyUser:[password]@127.0.0.1:8080/solr/select?'.$_SERVER["QUERY_STRING"]);

?>

That’s pretty much it. You can now test the proxy like so

http://q.example.com/select?q=*%3A*&indent=on

P.S. I’m working on a “from scratch” installation guide for solr, so stay tuned!

    • Kai
    • November 8th, 2010

    Hi, not sure if you have seen this but it is very cool:

    http://evolvingweb.github.com/ajax-solr/

    The less cool part is not knowing the best way to get the ajax using a php proxy to solr and getting solr to talk to MySQL.

    If you could do a ground up tutorial that enables the back end for the above tutorial to work then we would have a complete kick-ass ajaxified solr php mysql tutorial. Just an idea :)

    Thanks anyway for this post, it kinda points me in the right direction, I think.

  1. Yes, I’ve used that extensively on one of my sites. I’ll look into making a tutorial as suggested.

    • Kai
    • November 16th, 2010

    I love your findqatar site, very cool design.

  2. Thanks! Wish it would get some more trafic soon though.. =) Slightly overkill with a search engine for the current five ads, hehe.

  1. No trackbacks yet.