Faster Regexes: What to do when text matching is your bottleneck
Faster Regexes: What to do when text matching is your bottleneck
By Aaron Crane (arc) from Edinburgh.pm, London.pm
Date: Thursday, 14 August 2008 15:10
Duration: 30 minutes
Language:
Tags: optimisation regex regexp
We all know how good Perl is at munging text. But what do you do when your Perl text-munging code isn't fast enough for what you're trying to do?
We needed to extract useful information from tens of gigabytes of web-server log files. Our Perl code was simple and obvious, but not fast enough for our purposes. When profiling revealed a frequently-executed regex as the bottleneck, we tried several things to make it faster.
This talk looks at what we did to speed up our regex-heavy code (by a factor of well over 100 in some places), identifying a few general-purpose optimisation techniques on the way.
Attended by: Vincent Pit (vincent), Leon Brocard (acme), Andrey Shitov (ash), Anton Berezin (Grrrr), Barbie, Jacob Bunk Nielsen, Steffen Mueller, Stefan Hornburg (Racke), Casiano Rodriguez-Leon (casiano), Damian Conway (damian), Sue Mynott (virtualsue), Francoise Dehinbo (franky), Michael Zedeler (mzedeler), Rasmus Hansen (rasmoo), Dmitry Karasik (McFist), Karen Pauley, Alex Balhatchet (Kaoru), William Travis Holton, Stéphane Payrard (cognominal), Alberto Simões (ambs), Alex Kapranoff (kappa), Arne Sommer (Arne), Martin Schipany (ElCondor), Stefan Hanski, Darius Jokilehto, Stan Sawa, Jörg Plate (Patterner), allan dystrup (ady), Patrick Michaud (Pm), Wendy Van Dijk (woolfy), David Jack Wange Olrik (davidolrik), Andreas Hetey, Mark Fowler (Trelane), Gertraud Unterreitmeier (Gertraud), Nicholas Clark, Bart Lateur, Lars Dɪᴇᴄᴋᴏᴡ (daxim), Matija Grabnar (matija), Søren Døygaard, Erik Johansen (uniejo), Roel de Cock, Nuno Carvalho (smash), Martin Kjeldsen (baest),