Document Deduplication with Locality Sensitive Hashing