Apache Solr: A Truly Advanced Search

What is Apache Solr and how do you run it locally? How do you get Drupal data into Apache Solr?
These are both questions you may have asked yourself at one time or another. Well, if you’re a developer at least. I hope to answer those questions as well as:

•  How to search Apache Solr from Drupal
•  How to modify what’s searched and the results
•  Theming search results
•  Fabulous-looking UI and tricks

Apache Solr is an open-source, enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world’s largest Internet sites.

With the Apache Solr module for Drupal, you can integrate Drupal with the Apache Solr search platform. Solr search can be used as a replacement for core content search and boasts both extra features and better performance. Among the extra features is the ability to have faceted searches ranging from content author to taxonomy to arbitrary CCK fields.

Drupal interacts with Solr via HTTP and sends data to Solr as an XML. Solr accepts documents POSTed to / updated as well as an additional XML, if you want, for POSTed to /delete.

The module comes with schema.xml and solrconfig.xml files which must be used in your Solr installation in order to get the module to work correctly. The schema defines the types and fields to be used.

Example of a field:
<field name =”” type=”string” index=”true” stored=”true” />

•  Name the ID (required for indexing as the key identifier)
•  Indexed = search it, if not indexed then it won’t be found.
•  Stored = save data, if not, it won’t save.

Example of Dynamic Field
<dynamicField name=”is_*” type=”integer” indexed=”true” stored=”true” multiValued=”false” />
multiValued = store multiple values, otherwise it’s a single value field

To create a sortable version of the dynamic string field you use:
<copyField sources=”ss_*” dest=”sort_ss_*” />

By giving a “dynamicField” the name “nodeaccess,” you are declaring the permission for that field:

In order to get rid of data that is not used, you can “ignore” it:t;>
<dynamicField name =”*” type=”ignored” multiValued=”true” />

This module depends on the search framework in the core. However, you may not want the core searches and only want Solr search. If that is the case, you want to use the Core Searches module in tandem with this module.

If you’re looking for Solr PHP integration, this is possibly the best option available. This is also one of the best ways to achieve a faceted search. In addition, since you can shift the load of searches from PHP+SQL to a totally separate server, using Solr can help to scale Drupal for large, high-traffic sites.

You can add any data to the index by using:
hook_apachesolr_update_index($document, $node, $namespace).

This is used to add more data to a doc before it is sent to Solr and can also be used to alter or replace data added by Apache Solr or another module. This works the same as _alter hook.

If you want to exclude entire content types, you can just use the UI that is sent with the Apache Solr Module at this locatio n:
/settings/apachesolr/admin

The UI is the easiest way of implementation of Apache Solr and Drupal, but if you want a more precise indexing, you can control that with additional hooks.

•  hook_apachesolr_node_exclude()
•  hook_nodeapi()
•  hook_apachesolr_document_handlers()

For more on formation on this visit: http://evolvingweb.ca/story/apache-solr-mastery-how-add-custom-search-pa…

To index CCK field information or Facet MIMIE type of CCK field, Solr needs 4 things:

1. The data type to use in the index
2. The CCK widget types to use during indexing
3. An indexing CALLBACK function to be used for extracting the data from the CCK
4. A display CALLBACK function to use for displaying the data from the CCK field.

hook_apachesolr_cck_fields_alter parameter definitions:

•  display_callback (optional)
•  facet_block_callback (optional)
•  index_type (required)
•  indexing_callback (required)

For more information on indexing CCK fields goto: http://drupal.org/node/776750

See the documentation in the handbook as well as the included README.txt for information on requirements and installation.

Drupal Page-Speed Optimization and Load Time

A few tips for improving page-load-speed is to:
  • Run tests to track overall speed and see in what areas your site can improve
  • Make fewer HTTP requests
  • Use CDN
  • Increase front-end speed lt;/li>
  • Increase back-end speed
  • Use monitoring tools

These subjects are extensive on their own, and will be only briefly covered in this article. For further and in-depth information visit: http://wimleers.com/article/improving-drupals-page-loading-performance.

When testing a page’s load time, you will want to track every little nook and cranny. To track all of the HTTP requests, you can use http://webpagetest.org. The test will be conducted from the location specified, and you will be provided a waterfall diagram of your page’s load performance as well as a comparison against an optimization checklist. Please visit the PageTest wiki page for more information. (Sample results for AOL.com can be seen here.)

You can also test the objects loading on your browser by using BrowserScope. BrowserScope is a community-driven project for profiling web browsers with goals of fostering innovation by tracking browser functionality and serving as a resource for web developers.

Here are some free tools that you can use to evaluate the speed of your site:

  • Page Speed, an open source Firefox/Firebug add-on that evaluates the performance of web pages and gives suggestions for improvement.
  • YSlow, a free tool from Yahoo! that suggests ways to improve website speed.
  • WebPagetest shows a waterfall view of your page’s load performance plus an optimization checklist.
  • In Webmaster Tools, Labs > Site Performance shows the speed of your website as experienced by users around the world as in the chart below. We’ve also blogged about site performance.

A good way to lower the number of HTTP requests is to take advantage of Drupal’s ability to compress CSS and JS files (through stripping comments and whitespace).  Host static content on different domains and use Googles Custom Search.

A content delivery network, or content distribution network (CDN), is a system of computers containing copies of data placed at various points in a network to maximize bandwidth for access to the data from clients throughout the network. A client accesses a copy of the data important to said client, as opposed to all clients accessing the same central server, so as to avoid bottleneck near that server. You can use module SimpleCDN to do this on a Drupal installation.

To increase Front- and Back-end speed, you will need to set the configurations for all your modules to an optimal setting. This means setting all of the performance modules in Drupal and in the server.

Once the site has been optimized, you are going to want to monitor it. When monitoring, you want to check trend spotting, capacity and load, and review impacting changes. After checking, analyze the trends and find failures and up-times. If your site was running optimally, but a few months later you add a new feature to your site, monitoring will help you see how the new feature affected the overall performance.

The rewards for page-load-speed are well worth the effort. By increasing your speed, you are more capable to serve your site to a host of people. Google actually encourages sites’ loading speeds by upping your rank in search relevancy when you have a fast-serving site.

Tools are as listed:

  • Yslow for Firebug – Grades site based on Yahoo! best practices page speed – Firebug – Google (Steve Souders – overall performance summary)
  • webpagetest.orgBest for waterfall diagrams – talks to server, simulates connection speeds, browsers record visual representation of the page loading.
  • Apache Bench – impression of how your current Apache installation preforms, shows how many requests per second . ( Ab -n # -c # http://%5Bsite%5D/ )
  • Jmeter – load testing tool – log ssh, simulate authentic user
  • Devel Module – check query time, page and data ex time, checks views stats separately, authentic users
  • Parallels Module
  • Browserscope.org – Number of objects
  • Media Mover Module – performance module settings and how they work
  • Cache router – uses fast path settings, bypassing default cache use
  • Reverse proxy – advanced step: squid/varnish node/91813 – high performance range
  • Memcache Module – caches stuff in memory – uses pair values stored in memory – may live on other servers
  • Boost – extension of performance module – instead of caching results in table, stores them in files bypassing PHP and Mysql – limited to anonymous visitors – good for slashdot but not for sites with high number of authentic visitors – uses apache mod_rewrite directives to check if GET.  (retro mode)
  • BACKEND – PHP accelerator – APC is the alternative PHP cache, which is free, open and robust framework for caching and optimizing PHP intermediate code – Xcache and Accelerator are other options – Check configuration for proper memory settings.
  • MySQL Caching – Enable MySQL Query Cache and give it memory – index slow queries that run often – InnoDB Buffer Pool ++ ( Key_buffer is important for temp tables ) Core Search Runs better on MyISAM ( but don’t use Core Search )
  • apache – extend mod_espires settings
  • apache – deflate_module
  • apache – solution nginx – gzimp module
  • Nagios – monitoring tool

For Support visit:

http://wimleers.com/article/improving-drupals-page-loading-performance

http://drupal.org/project/CDN/Simplecdn

http://drupal.org/project/cdn

http://twobits.com/
(blogs about performance and security)

Drupal Accessibility

What is User Accessibility? The basic idea is that a website should have 100% user accessibility, which means they can use and interact with your site completely. In developing websites, we are strive for uniformity in all browsers, but rarely consider uniformity for all users. The truth is that 1:5 users has a disability and, in most cases, two or more types of disabilities.

If these disabilities are not addressed during development, websites have the potential to lose 20% of potential users.

The fundamentals of web accessibility standards are set by, but are not limited to, The W3C.

The W3C documentation can be found at Web Accessibility Initiative (WAI). The current web content accessibility guidelines are located in a document at Web Content Accessibility Guidelines (WCAG) 2.0.

When developing a website, you want to begin with the end in mind. Is the navigation accessible, user-friendly and self-explanatory? Do the CTA icons orient the user as to where they are in terms of the site frame work?

The four rules for Creating a web accessible website (P.O.U.R.):

  • Perceivable – Is the content accessible? Screen Reader, Audio, Caption and Brail?
  • Operable – Can the user preform all the actions to navigate though the site?
  • Understandable – Does the user get confused with the web actions?
  • Robust – Will the interface be understandable, operable, and accessible in the future as technologies advance?

A user can be deaf, blind, partially blind, color blind, partially color blind, and the list goes on. So, in order to make sure content is Perceivable, the site must be developed by keeping in mind the tools a visitor will use.

Blind and partially blind users would use a Screen Reader (NVDA/Jaws/Thunders). These screen readers require HTML W3C compliance. Without the proper XHTML formatting/structure, these applications don’t function properly for users. For instance, improper use of heading tags, alt text, form and table structure, will cause confusion as in what they are supposedly perceiving. In the alt text, be descriptive and make the message user-friendly. For an example, an icon to purchase a ticket could read,” Purchase icon” or “Click here to buy icon”. One recommended practice for blind users is to create Skip Links (hidden anchors that can be used to skip to the top of a page to get back to the main navigation).

With partial or full color blind users, it’s recommended to not use color cues because the user might not be able to notice them. In this case, text formatting cues like bold, border, or underline are more appropriate. Also be careful with link colors, since some users may not be able to identify the link.

Recently, YouTube has pushed for closed captioning for deaf users. This is a huge deal when it comes to the hearing media files. One recommendation is to use a pointer to create text equivalents of audio files.

People with disabilities sometimes use unique equipment to browse the web, such as keyboards with only basic keys (no cmd/Widow/Apple key) like arrow and enter, head wands or chin mouses. When developing a site we want to make sure we are not disabling or limited these tools. For instance, we don’t want to override the enter key on an online form. The enter key should be used to submit the form (default functionality).

One of the best ways to create a user-friendly experience visitors with disabilities is by using WAI-ARIA. According to www.w3.org, “WAI-ARIA, the Accessible Rich Internet Applications Suite, defines a way to make Web content and Web applications more accessible to people with disabilities. It especially helps with dynamic content and advanced user interface controls developed with Ajax, HTML, JavaScript, and related technologies. Currently certain functionality used in Web sites is not available to some users with disabilities, especially people who rely on screen readers and people who cannot use a mouse. WAI-ARIA addresses these accessibility challenges, for example, by defining new ways for functionality to be provided to assistive technology. With WAI-ARIA, developers can make advanced Web applications accessible and usable to people with disabilities”.

The cool thing about Drupal is that it makes significant strides toward accessibility in version 6 and even further in version 7. Case in point, in Drupal 7 there are classes (element-invisible and element-hidden) that will hide content, but will be readable for users with disabilities via Screen Readers, brail or other applications.

A good way to test your site for user accessibility is to screen users with disabilities. But if you can’t do that, the next best thing is to do a “squint-test”. This involves turning off all JavaScript, Flash, CSS, and unplugging your mouse, and then trying to navigate though your site. You can look at visual output checks with the VisChecks app. Also, the Fangs plugin for Firefox is a nice tool for checking Jaws publishing. Another good route is checking with W3C validators for XHTML.

The Ultimate benefit from creating a disabilities-friendly site is more users, more traffic. That’s the ultimate goal, right?!

Check out the chatter of Accessibility on IRC on Twitter. Just search drupal-accessibility

The Honda CRX Project: Rear Seats

So the deal with my wife was that I am allowed to buy the car if I can get the kids into it. So from Day One, I’ve been on a mission to put in rear seats.

From all of my research I’ve come up with two solutions. Buy the JDM Rear Seats for roughly $400 or fabricate something…

Well, I went the cheap creative route and decided to just get something that I could throw back there. So After some more extensive research, I’ve come to find that the 1990 Integra Rear Seats would fit in perfectly. So Naturally, I went to a junkyard that luckly had a totaled 1990 Integra.

So I stripped out the Rear Seats and the bottom to the Rear Seats. I also stripped out the seat belts and bolts.

Once that was all done, I grouped up with my friend JP to figure out how we should install the seats.

The seat’s fit in just perfectly! They even kept the folding down function and can be stripped out of the car at any point.

So we finished installing the Rear Seats. Now all that’s left is installing the seat belts.

(I will write more and add pictures later.)