How to make Google Books at home

How to make Google Books at home

By Andrey Shitov (‎ash‎)
Date: Thursday, 14 August 2008 10:40
Duration: 30 minutes
Tags: book highlight morphology search text

You can find more information on the speaker's site:

You've published a book. You wish to allow readers search online through it. From search page you not only want to get the list of pages which contain the word or phrase, but also a screenshot of a particular page where found words are highlighted.

You have to solve a number of hard tasks to obtain cool result. I will cover some of them on a live example.

* Working with book layout and converting it into a suitable format.
* Extracting paragraphs, phrases and words from the layout.
* Understanding the importance of separate words.
* Thinking of how to restore the word order if the source had damaged it.
* Restoring words split with hyphens.
* Indexing the text of a book.
* What is better for index: dictionary or morphology engine?
* Building the cloud of popular words.
* Generating previews and thumbnails.
* Highlighting words that are found.
* Caching search results.
* Adding hot word lists to search results.

Attended by: Andrey Shitov (‎ash‎), Paul-Christophe Varoutas, Casiano Rodriguez-Leon (‎casiano‎), Diego Kuperman (‎diegok‎), Kaare Rasmussen, Andreas Hetey, Gianni Ceccarelli (‎dakkar‎),