Colourblind

Welcome to Colourblind.

This is the personal web space of Tom Milsom. As much as possible everything is free (as in speach and as in beer).


Make text: Smaller Bigger

Pylighter - Python Syntax Highlighting

Posted by Tom on 31/05/2010 17:11:29

Back when I was first picking up Python I went looking for some code to syntax highlight it for blog posts. For C# I use Jean-Claude Manoli's C# Formatter, and wanted to reuse the same stylesheets. (I've since been distracted by other things, but I thought this was worth finishing off.)

After some brief Googling I found a likely candidate for plagiarism - the syntax highlighter that comes with MoinMoin. But it uses <font> tags and hard-coded colours, both of which are proven to be carcinogenic to cute little kittens. Clearly this will not do. Someone has to think of the kittens.

Here is the result of some fairly heavy tweaking - Pylighter.

And as formatted by itself.

   1:   # Pylighter - monochromacy.net
   2:   # HTML syntax highlighting for Python
   3:   # based on the MoinMoin Python Source Parser - moinmo.in
   4:   # compatible with the Manoli highlighting styles - www.manoli.net/csharpformat/
   5:   
   6:   import cgi, string, sys, StringIO
   7:   import keyword, token, tokenize
   8:   
   9:   _KEYWORD = token.NT_OFFSET + 1
  10:   
  11:   _classes = {
  12:       token.NUMBER:       'str',
  13:       token.OP:           'op',
  14:       token.STRING:       'str',
  15:       tokenize.COMMENT:   'rem',
  16:       token.ERRORTOKEN:   'kwrd',
  17:       _KEYWORD:           'kwrd',
  18:   }
  19:   
  20:   class Parser:
  21:       """ Send colored python source.
  22:       """
  23:   
  24:       def __init__(self, raw, includePreamble, out = sys.stdout):
  25:           """ Store the source text.
  26:           """
  27:           self.raw = string.strip(string.expandtabs(raw))
  28:           self.includePreamble = includePreamble
  29:           self.out = out
  30:   
  31:       def format(self, formatter, form):
  32:           """ Parse and send the colored source.
  33:           """
  34:   
  35:           if self.includePreamble:
  36:               self.out.write('<html>\n')
  37:               self.out.write('<head>\n')
  38:               self.out.write('<link rel="stylesheet" type="text/css" href="http://monochromacy.net/Skins/Cbv2/Lib/Css/Style.css" />\n')
  39:               self.out.write('<link rel="stylesheet" type="text/css" href="http://monochromacy.net/Skins/Cbv2/Lib/Css/Code.css" />\n')
  40:               self.out.write('</head>\n')
  41:               self.out.write('<body>\n')
  42:   
  43:           self.lineNum = 1
  44:           self.newlineRequired = True
  45:           self.colPos = 0
  46:   
  47:           self.out.write('<div class="code">\n')
  48:           tokenize.tokenize(StringIO.StringIO(self.raw).readline, self)
  49:           self.out.write('</pre>\n')
  50:           self.out.write('</div>\n')
  51:   
  52:           if self.includePreamble:
  53:               self.out.write('</body>\n')
  54:               self.out.write('</html>\n')
  55:   
  56:       def __call__(self, toktype, toktext, (srow,scol), (erow,ecol), line):
  57:           """ Token handler.
  58:           """
  59:           if 0:
  60:               print "type", toktype, token.tok_name[toktype], "text", toktext,
  61:               print "start", srow,scol, "end", erow,ecol, "<br>"
  62:   
  63:           # Handle multi-line strings with sneaky recursion
  64:           if toktype == token.STRING and toktext.count('\n') > 0:
  65:               lines = toktext.split('\n')
  66:               for i in range(len(lines)):
  67:                   self.__call__(token.STRING, lines[i], (0, 0), (0, len(lines[i])), lines[i])
  68:                   if i < len(lines) - 1:
  69:                       self.__call__(token.NEWLINE, '', (0, 0), (0, 0), lines[i])
  70:   
  71:               self.newlineRequired = False
  72:               self.colPos = 0
  73:               return
  74:   
  75:           # Write the line number if required
  76:           if self.newlineRequired:
  77:               spaces = ' ' * (4 - len(str(self.lineNum)))
  78:               self.out.write('<pre><span class="lnum">{0}{1}:   </span>'.format(spaces, self.lineNum))
  79:               self.newlineRequired = False
  80:   
  81:           # Handle newlines
  82:           if toktype in [token.NEWLINE, tokenize.NL]:
  83:               self.out.write('</pre>\n')
  84:               self.lineNum = self.lineNum + 1
  85:               self.colPos = 0
  86:               self.newlineRequired = True
  87:               return
  88:   
  89:           # Rewrite stripped out whitespace
  90:           if scol > self.colPos:
  91:               self.out.write(line[self.colPos:scol])
  92:   
  93:           # Do some token type wrangling
  94:           if token.LPAR <= toktype and toktype <= token.OP:
  95:               toktype = token.OP
  96:           elif toktype == token.NAME and keyword.iskeyword(toktext):
  97:               toktype = _KEYWORD
  98:   
  99:           # Write the token with the relevant style
 100:           cssClass = _classes.get(toktype, None)
 101:           if cssClass != None:
 102:               self.out.write('<span class="%s">' % (cssClass))
 103:               self.out.write(cgi.escape(line[scol:ecol]))
 104:               self.out.write('</span>')
 105:           else:
 106:               self.out.write(cgi.escape(line[scol:ecol]))
 107:   
 108:           # Update the last character position so we can tell when whitespace
 109:           # is dropped
 110:           self.colPos = ecol
 111:   
 112:   if __name__ == "__main__":
 113:       import os
 114:   
 115:       source = open(sys.argv[1]).read()
 116:       outfile = sys.argv[1] + '.html'
 117:   
 118:       Parser(source, True, open(outfile, 'wt')).format(None, None)
 119:   
 120:       if os.name == "nt":
 121:           os.system("explorer " + outfile)
 122:       else:
 123:           os.system("netscape " + outfile + " &")

So there you go. If you've already got a stylesheet set up for the Manoli C# formatter and want to reuse it for Python: enjoy.

Tags: Python

Comments

Add Comment




Good luck with that

Please type the characters from the image above into the box below
or click here to get a new one


Submit