Developers, developers, developers... (Part 2)
2010-09-26, 16:20 by Jonas Wallden in Development
"Eat your own dog food" is a popular way to articulate that developers should use their own products in real-world projects. The idea is of course that we should understand the work environment of our customers and make sure the product handles those tasks well. I can guarantee that all of Roxen's developers do this routinely, though over the years one may develop a blind spot for certain shortcomings.
One such feature that the CMS is currently missing is something that I personally have worked around using command-line tools to the extent that it's become second nature. However, seeing other people struggle with the same problem recently has reminded me that we could do better. What I'm referring to is repository searching.
But wait, isn't there a search engine in the CMS?
True, and it was part of the initial plan for this feature but prototyping has shown that it's not suitable for code searching. It strips information such as punctuation and special characters during indexing meaning that it will treat <foo-bar/> and <foo bar="..."/> as equal in a search. This is not a problem – rather a good thing! – for plain-text searches but makes code searches practically useless. Programmers also need options such as case sensitivity, regular expressions etc to separate interesting hits from false matches.
Searching through source code
In the next Roxen CMS 5.0 maintenance release the new repository search is found in the Content Editor's Site Navigation menu that is currently housing the filename search field. Beneath it is a new link that brings up a separate window where you perform the content search. The keyboard shortcut for this link is Shift-F.
Here is the window with the search form that will appear:
The search form provides the following controls:
- Content – Text to search for.
- Path – Restrict searches to paths containing all of the listed words (just like the search field in the Site Navigation menu).
- Options – Toggles for use of Regexp pattern, case sensitivity, and whether read-only mountpoints such as /roxen-files/ should be included.
- File Types – Limit searches by one or more file types.
When you start a search it runs in two phases. The system first builds the candidate list of files by querying the Site News database (which knows all file names and their content types). Any matching edit area files are added to the list as well.
In the second phase the list of files is handed over to an external script that performs the actual scan of the files, reporting progress and any hits back to the browser in real-time. The window's title bar displays the status while the search is running. You can stop at any time if you have found what you are looking for before all files have been inspected. Doing so will avoid creating unnecessary I/O load on the server.
Here is what a typical search result looks like:
Hits are grouped by path in alphabetical order (and separated by language fork if necessary) and prefixed with line numbers. Click on a path to navigate to the corresponding Content Editor page.
Some search tips
The Regexp option is very powerful. Internally this invokes Pike's PCRE library so by googling PCRE syntax you can learn some neat tricks. For instance, searching for
will find any CSS margin declaration in the range 10-19 pixels by taking advantage of the (a|b) syntax for subpatterns and [\s] for whitespace sequences.
Keep in mind that files are searched line by line so you cannot find text that spans multiple lines. This first implementation also has known issues in that file encodings may not always be handled correctly.