Django Chop Text Filter

Yesterday I read a post by Ed Eliot that provides an SEO-friendly way of chopping text by an arbitrary number of characters. This method only chops on word boundaries to avoid ugly, broken sentences.

The comments listing on my blog homepage has been broken for a while now. It didn't properly filter out markdown syntax and had other visual anomalies. I was also using the simple Django slice filter to show a 40 character preview of comment text. This was ugly, so yesterday I implemented Ed's chop text function as a Django filter:

from django import template
from django.utils.html import conditional_escape
from django.utils.safestring import mark_safe
import re

register = template.Library()

def choptext(value, char='20'):
   count = int(char)
   if len(value) > count:
      count = count - 1
      exp = r'^\s*(.{0,%s}[^\s])\b\s*(.*)' % char
      m = re.compile(exp).match(value)
      result = '<span class="chopped">%s<span> %s</span></span>' % (conditional_escape(m.group(1)), conditional_escape(m.group(2)))
      return mark_safe(result)
   else:
      return value

register.filter(choptext)

Read Ed's post for an explanation of how this is intended to work. Basically it surrounds an arbitrary number of characters plus the remaining characters with span tags. The first span will be displayed as normal, but the stylesheet hides the remaining text using absolute positioning. This way search engines and accessibility programs will still read the text, but the visual user won't see it in their browser. Here is the CSS from Ed:

.chopped span {
  position: absolute;
  text-indent: -999em;
}

Also check out the link for how to add ellipses using a background image, instead of inserting it into the string and muddying the text for search engines and screen readers.

I added this to blog_util.py in the templatetags directory for my blog application. Now in templates I can do this:

{% load blog_util %}

<p>{{ comment.comment|choptext:"40" }}</p>

Markdown

Comments here support markdown syntax, however, and this made my implementation a little trickier. If I don't put the comment text through |markdown then the output contains unnecessary braces, parenthesis, and raw URLs. However, the markdown filter transforms the text in a way that choptext's regular expression didn't like. Thus this caused an exception:

{{ comment.comment|markdown|choptext:"40" }}

Turns out some whitespace was being added to the beginning of the string and this broke the regular expression. I modified it to ignore \s's at the beginning of the input string, which is reflected in the above snippet. If you're going to use this, though, you'll probably also want to strip markdown's output of HTML tags so choptext doesn't escape them. Otherwise the output is littered with <p>'s and <a>'s. Do something like this instead:

{{ comment.comment|markdown|striptags|choptext:"40" }}

For more information on writing custom Django filters, see the documentation.

blog comments powered by Disqus

All content licensed under a Creative Commons Attribution-Share Alike 3.0 License unless otherwise noted.