net.obsearch.index.ghs
Class CompressedBitSet64

java.lang.Object
  extended by net.obsearch.index.ghs.CompressedBitSet64

public class CompressedBitSet64
extends Object

CompressedBitSet64 stores bits in a byte array. The bit set must be first created (stored in a temporary file) And then the bytes will be loaded into memory. The compressed bit set works on longs and it allows sequential k-nn searches of longs with the hamming distance. Insertions must be done in ascending order. The main assumption is that compression will allow small indexes that will be stored in memory.

Author:
Arnoldo Jose Muller Molina

Field Summary
protected  int count
           
protected  byte[] data
           
protected  long first
           
 
Constructor Summary
CompressedBitSet64()
          Create a new compressed bit set.
 
Method Summary
 void add(long bit)
          Add the ith bit to this bitset.
 int bucketDistance(long a, long b)
          Hamming distance used for searching.
 void commit()
          We will stop adding values and now we will use the bit set for search.
protected  long[] getAll()
          Return all the buckets just for debugging purposes.
 long getBytesSize()
           
protected  long read(it.unimi.dsi.io.InputBitStream in)
           
 long[] searchBuckets(long query, int maxF, int m)
          Search the maxF closest buckets by hamming distance for the given query
 List<OBResultInvertedByte<Long>> searchFull(long query)
          Search the entire bitset.
 int size()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

data

protected byte[] data

count

protected int count

first

protected long first
Constructor Detail

CompressedBitSet64

public CompressedBitSet64()
                   throws OBException
Create a new compressed bit set.

Throws:
IOException
OBException
Method Detail

add

public void add(long bit)
         throws OBException
Add the ith bit to this bitset.

Parameters:
bit -
Throws:
OBException

commit

public void commit()
            throws OBException
We will stop adding values and now we will use the bit set for search.

Throws:
OBException

getBytesSize

public long getBytesSize()

bucketDistance

public final int bucketDistance(long a,
                                long b)
Hamming distance used for searching.

Parameters:
a -
b -
Returns:

searchBuckets

public long[] searchBuckets(long query,
                            int maxF,
                            int m)
                     throws OBException
Search the maxF closest buckets by hamming distance for the given query

Parameters:
query - The query we will use
maxF - The number of objects that will be searched.
m - the number of bits == max distance expected.
Returns:
The closest objects to the given query.
Throws:
InstantiationException
IllegalAccessException
OBException

searchFull

public List<OBResultInvertedByte<Long>> searchFull(long query)
                                            throws OBException
Search the entire bitset. This should be done with datasets that can fit in memory and only to generate statistics on the data.

Parameters:
query - query to search
Returns:
Throws:
OBException

size

public int size()

getAll

protected long[] getAll()
                 throws IOException
Return all the buckets just for debugging purposes.

Returns:
Throws:
IOException

read

protected long read(it.unimi.dsi.io.InputBitStream in)
             throws IOException
Throws:
IOException


Copyright © 2007-2011 Arnoldo Jose Muller Molina. All Rights Reserved.