Developers, developers, developers... (Part 2)

2010-09-26, 16:20 by Jonas Wallden in Development

"Eat your own dog food" is a popular way to articulate that developers should use their own products in real-world projects. The idea is of course that we should understand the work environment of our customers and make sure the product handles those tasks well. I can guarantee that all of Roxen's developers do this routinely, though over the years one may develop a blind spot for certain shortcomings.

One such feature that the CMS is currently missing is something that I personally have worked around using command-line tools to the extent that it's become second nature. However, seeing other people struggle with the same problem recently has reminded me that we could do better. What I'm referring to is repository searching.

It happens every day that I need to find XSLT templates, CSS declarations, RXML variables, JavaScript functions, HTML comments or other pieces of code. Not only is it helpful in sites I've developed myself – it's doubly so in projects where other people have written the templates and stylesheets. The workaround I've lived with is to use grep directly in the repository directory but that is far from user-friendly and suffers from drawbacks such as getting hits in deleted files or not understanding language forks.

An example of find and grep commands in action.

But wait, isn't there a search engine in the CMS?

True, and it was part of the initial plan for this feature but prototyping has shown that it's not suitable for code searching. It strips information such as punctuation and special characters during indexing meaning that it will treat <foo-bar/> and <foo bar="..."/> as equal in a search. This is not a problem – rather a good thing! – for plain-text searches but makes code searches practically useless. Programmers also need options such as case sensitivity, regular expressions etc to separate interesting hits from false matches.

Searching through source code

In the next Roxen CMS 5.0 maintenance release the new repository search is found in the Content Editor's Site Navigation menu that is currently housing the filename search field. Beneath it is a new link that brings up a separate window where you perform the content search. The keyboard shortcut for this link is Shift-F.

Here is the window with the search form that will appear:

The search form provides the following controls:

  • Content – Text to search for.
  • Path – Restrict searches to paths containing all of the listed words (just like the search field in the Site Navigation menu).
  • Options – Toggles for use of Regexp pattern, case sensitivity, and whether read-only mountpoints such as /roxen-files/ should be included.
  • File Types – Limit searches by one or more file types.

When you start a search it runs in two phases. The system first builds the candidate list of files by querying the Site News database (which knows all file names and their content types). Any matching edit area files are added to the list as well.

In the second phase the list of files is handed over to an external script that performs the actual scan of the files, reporting progress and any hits back to the browser in real-time. The window's title bar displays the status while the search is running. You can stop at any time if you have found what you are looking for before all files have been inspected. Doing so will avoid creating unnecessary I/O load on the server.

Here is what a typical search result looks like:

Hits are grouped by path in alphabetical order (and separated by language fork if necessary) and prefixed with line numbers. Click on a path to navigate to the corresponding Content Editor page.

Some search tips

The Regexp option is very powerful. Internally this invokes Pike's PCRE library so by googling PCRE syntax you can learn some neat tricks. For instance, searching for 

  margin(|-left|-bottom|-top|-right):(|.*[\s])1[0-9]px

will find any CSS margin declaration in the range 10-19 pixels by taking advantage of the (a|b) syntax for subpatterns and [\s] for whitespace sequences.

Keep in mind that files are searched line by line so you cannot find text that spans multiple lines. This first implementation also has known issues in that file encodings may not always be handled correctly.

 

You need to log in to post comments.

 

1   Pontus Östlund

2010-09-27 10:18

That's a really nice feature :) Good job!

2   Henry Umansky

2011-02-11 20:56

The "content search" is an excellent addition, especially for those that did not have access to the filesystem to do grep searches. My only complaint is that I feel the link to "Content Search" should also be under the "View" button. Clicking the compass icon on our servers takes a long time (over 45 seconds on some servers).

3   Jonas Wallden

2011-02-12 12:54

Good suggestion! For now you can use the Shift-F keyboard shortcut to avoid waiting for the menu.

Nov 23, 2017

Categories

Community Update (1)
Customers (0)
Development (10)
New sites (1)

Latest comments

Good suggestion! For now you can use the Shift-F keyboard shortcut to avoid waiting for the menu.
The "content search" is an excellent addition, especially for those that did not have access to the filesystem to do grep searches. My only complaint is that I feel the link to "Content Search" should also be under the "View" button. Clicking the compass icon on our servers takes a long time (over 45 seconds on some servers).
That's a really nice feature :) Good job!