if I had to find duplicates in book catalogs with entries with different titles but with the same content..

This is related to google print application. In other words, we have many books out of which some books titles were different but content same. We need to figure out such cases efficiently. How?

2 thoughts on “if I had to find duplicates in book catalogs with entries with different titles but with the same content..”

rohith

16 years ago

What we can do is to have more attributes listed with every book, like the number of chapters, the list of chapters(in order), and check for similarity in them. At the highest level I think this is possible.

Pa1

16 years ago

What is your similarity function? So, that means you need to compare one to one i.e. O(n^2) algo ? isn’t it?

For an application like google print, there are some millions of books. If you run an n^2 algo, it will take some years to find similar books. isn’t it?

Pa1

2 thoughts on “if I had to find duplicates in book catalogs with entries with different titles but with the same content..”

Leave a Reply Cancel reply