|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectnet.obsearch.utils.BloomFilter64bit
public class BloomFilter64bit
A Bloom filter. SLIGHTLY ADAPTED VERSION OF MG4J it.unimi.dsi.mg4j.util.BloomFilter
KEY CHANGES:
Instances of this class represent a set of character sequences (with false positives) using a Bloom filter. Because of the way Bloom filters work, you cannot remove elements.
Bloom filters have an expected error rate, depending on the number of hash functions used, on the filter size and on the number of elements in the filter. This implementation uses a variable optimal number of hash functions, depending on the expected number of elements. More precisely, a Bloom filter for n character sequences with d hash functions will use ln 2 dn ≈ 1.44 dn bits; false positives will happen with probability 2-d.
Hash functions are generated at creation time using universal hashing. Each hash function
uses NUMBER_OF_WEIGHTS
random integers, which are cyclically multiplied by
the character codes in a character sequence. The resulting integers are XOR-ed together.
This class exports access methods that are very similar to those of Set
,
but it does not implement that interface, as too many non-optional methods
would be unimplementable (e.g., iterators).
Field Summary | |
---|---|
protected static long |
ADDRESS_BITS_PER_UNIT
|
protected static long |
BIT_INDEX_MASK
|
int |
d
The number of hash functions used by this filter. |
long |
m
The number of bits in this filter. |
static int |
NUMBER_OF_WEIGHTS
The number of weights used to create hash functions. |
Constructor Summary | |
---|---|
BloomFilter64bit(int n,
int d)
Creates a new Bloom filter with given number of hash functions and expected number of elements. |
Method Summary | |
---|---|
boolean |
add(long s)
Adds a character sequence to the filter. |
boolean |
contains(long s)
Checks whether the given character sequence is in this filter. |
protected boolean |
getBit(long bitIndex)
Returns from the local bitvector the value of the bit with the specified index. |
long |
getSizeBytes()
|
protected void |
setBit(long bitIndex)
Changes the bit with index bitIndex in local bitvector. |
int |
size()
The number of character sequences in the filter. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int NUMBER_OF_WEIGHTS
public final long m
public final int d
protected static final long ADDRESS_BITS_PER_UNIT
protected static final long BIT_INDEX_MASK
Constructor Detail |
---|
public BloomFilter64bit(int n, int d)
n
- the expected number of elements.d
- the number of hash functions; if the filter add not more than n
elements,
false positives will happen with probability 2-d.Method Detail |
---|
public int size()
#contains(CharSequence)
).public boolean contains(long s)
Note that this method may return true on a character sequence that is has not been added to the filter. This will happen with probability 2-d, where d is the number of hash functions specified at creation time, if the number of the elements in the filter is less than n, the number of expected elements specified at creation time.
s
- a character sequence.
public boolean add(long s)
s
- a character sequence.
#contains(CharSequence)
).protected boolean getBit(long bitIndex)
bitIndex
- the bit index.
protected void setBit(long bitIndex)
bitIndex
- the index of the bit to be set.public long getSizeBytes()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |