Faster Regexes: What to do when text matching is your bottleneck

Faster Regexes: What to do when text matching is your bottleneck

By Aaron Crane (‎arc‎) from Edinburgh.pm
Date: Thursday, 14 August 2008 15:10
Duration: 30 minutes
Language:
Tags: optimisation regex regexp


We all know how good Perl is at munging text. But what do you do when your Perl text-munging code isn't fast enough for what you're trying to do?

We needed to extract useful information from tens of gigabytes of web-server log files. Our Perl code was simple and obvious, but not fast enough for our purposes. When profiling revealed a frequently-executed regex as the bottleneck, we tried several things to make it faster.

This talk looks at what we did to speed up our regex-heavy code (by a factor of well over 100 in some places), identifying a few general-purpose optimisation techniques on the way.


Attended by: Vincent Pit (‎vincent‎), Léon Brocard (‎acme‎), Andrew Shitov (‎ash‎), Anton Berezin (‎Grrrr‎), Barbie, Jacob Bunk Nielsen, Steffen Mueller, Stefan Hornburg (‎racke‎), Casiano Rodriguez-Leon (‎casiano‎), Damian Conway (‎damian‎), Sue Spence (‎virtualsue‎), Francoise Dehinbo (‎franky‎), Michael Zedeler (‎mzedeler‎), Rasmus Hansen (‎rasmoo‎), Dmitry Karasik (‎McFist‎), Karen Pauley, Alex Balhatchet (‎Kaoru‎), William Travis Holton, Stéphane Payrard (‎cognominal‎), Alberto Simões (‎ambs‎), Alex Kapranoff (‎kappa‎), Arne Sommer (‎Arne‎), Martin Schipany (‎ElCondor‎), Stefan Hanski, Darius Jokilehto, Stan Sawa, Jörg Plate (‎Patterner‎), allan dystrup (‎ady‎), Patrick Michaud (‎Pm‎), Wendy Van Dijk (‎woolfy‎), David Jack Wange Olrik (‎davidolrik‎), Andreas Hetey, Mark Fowler (‎Trelane‎), Gertraud Unterreitmeier (‎Gertraud‎), Nicholas Clark, Bart Lateur, Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯 (‎daxim‎), Matija Grabnar (‎matija‎), Søren Døygaard, Erik Johansen (‎uniejo‎), Roel de Cock, Nuno Carvalho (‎smash‎), Martin Kjeldsen (‎baest‎),