COMP670R: Topics in Theory: Hashing
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Dictionaries
D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Chapter 6.4.
Perfect Hashing. Randomized Algorithms, Rajeev Motwani and Prabhakar Raghavan, Section 8.5
Dynamic Perfect Hashing: Upper and Lower Bounds, Martin Dietzfelbinger, Anna Karlin, Kurt Mehlhorn, Friedhelm Meyer auf der Heide, Hans Rohnert, Robert E. Tarjan, SIAM Journal on Computing, Volume 23 , Issue 4 (August 1994), Pages: 738 - 761.
Cuckoo Hashing, Rasmus Pagh, Flemming Friche Rodler, Journal of Algorithms 51 (2004), p. 122-144.
Bloom Filters: http://en.wikipedia.org/wiki/Bloom_filter.
Chazelle, Bernard; Kilian, Joe; Rubinfeld, Ronitt; Tal, Ayellet (2004), "The Bloomier filter: an efficient data structure for static support lookup tables", Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 30–39.
The power of two random choices. See the survey here.
An optimal Bloom filter replacement, Anna Pagh, Rasmus Pagh, S. Srinivasa Rao.
On the Cell Probe Complexity of Membership and Perfect Hashing, Rasmus Pagh, STOC 2001.
On the Cell Probe Complexity of Dynamic Membership, Ke Yi and Qin Zhang, SODA 2010.
External hashing
D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Chapter 6.4.
M. S. Jensen and R. Pagh. Optimality in external memory hashing. Algorithmica, 52(3):403–411, 2008.
Dynamic External Hashing: The Limit of Buffering, Zhewei Wei, Ke Yi, and Qin Zhang, SPAA 2009.
Locality sensitive hashing
CACM survey of LSH (2008): "Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions" (by Alexandr Andoni and Piotr Indyk). Communications of the ACM, vol. 51, no. 1, 2008, pp. 117-122. directly from CACM (for free).
Earlier algorithm for Euclidean space (2004): a good introduction to LSH, and the description of affairs as of 2005, is in the following book chapter: Locality-Sensitive Hashing Scheme Based on p-Stable Distributions (by Alexandr Andoni, Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab Mirrokni), appearing in the book Nearest Neighbor Methods in Learning and Vision: Theory and Practice, by T. Darrell and P. Indyk and G. Shakhnarovich (eds.), MIT Press, 2006.
Original LSH algorithm (1999): the best algorithm for the Hamming space remains the one described, e.g, in [GIM'99] paper.
Hashing in streaming
The space complexity of approximating the frequency moments, Noga Alon, Yossi Matias, Mario Szegedy, Journal of Computer and System Sciences, Volume 58 , Issue 1, 1999, Pages: 137 - 147
G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms, 55(1):58-75, April 2005.
Randomized Synopses for Query Assurance on Data Streams. Ke Yi, Feifei Li, Marios Hadjieleftheriou, George Kollios, and Divesh Srivastava. ICDE' 08.
Cryptography
Miscellaneous
Why simple hash functions work: exploiting the entropy in a data stream. Michael Mitzenmacher, Salil Vadhan, SODA 2008.