Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

Prateek Jain$ ^1$, Sudheendra Vijayanarasimhan$ ^2$, and Kristen Grauman$ ^2$
Microsoft Research Lab, Bangalore, INDIA$ ^1$,
University of Texas, Austin, TX, USA$ ^2$


Goal: For large-scale active learning, want to repeatedly query annotators to label the most uncertain examples in a massive pool of unlabeled data $ \mathcal{U}$.

Image prob2

Margin-based selection criterion for SVMs [Tong & Koller, 2000] selects points nearest to current decision boundary: $ \mathbf{x}^*=\argmin_{\mathbf{x}_i\in {\mathcal U}}\vert\mathbf{w}^T\mathbf{x}_i\vert$

Problem: With massive unlabeled pool, we cannot afford exhaustive linear scan.

Main Idea: Sub-linear Time Active Selection

Idea: We define two hash function families that are locality-sensitive for the nearest neighbor to a hyperplane query search problem. The two variants offer trade-offs in error bounds versus computational cost.

Image active2

Main contributions:

Background: Locality-Sensitive Hashing (LSH)

Let $ d(\cdot,\cdot)$ be a distance function over items from a set $ S$, and for any item $ p \in S$, let $ B(p, r)$ denote the set of examples from $ S$ within radius $ r$ from $ p$.


Let $ h_{\cal{H}}$ denote a random choice of a hash function from the family $ \cal{H}$. The family $ \cal{H}$ is called $ (r, r (1+\epsilon), p_1, p_2)-$sensitive for $ d(\cdot,\cdot)$ when, for any $ q, p \in S$,


First Solution: Hyperplane Hash

Intuition: To retrieve those points for which $ \vert\mathbf{w}^T \mathbf{x}\vert$ is small, we want collisions to be probable for vectors perpendicular to hyperplane normal (assuming normalized data).
Image intuition2
For $ \u\sim \mathcal{N}(0,I)$, $ \Pr[$sign$ (\u ^T\mathbf{w})\neq$sign$ (\u ^T\mathbf{x})]=\frac{1}{\pi} \theta_{\mathbf{w},\mathbf{x}}$ [Goemans & Williamson, 1995].

Our idea: Generate two independent random vectors $ \u$ and $ \v$: one to capture angle between $ \mathbf{w}$ and $ \mathbf{x}$, and one to capture angle between $ -\mathbf{w}$ and $ \mathbf{x}$.

Image likely2 Image unlikely2

Definition: We define H-Hash function family $ \mathcal{H}$ as:

$\displaystyle h_{\mathcal{H}}(\mathbf{z})= \begin{cases}h_{\u , \v }(\mathbf{z}...
...\mathbf{z}), &\text{if $\mathbf{z}$ is a query hyperplane vector.} \end{cases}$    

where $ h_{\u , \v }(\mathbf{a},\b )=[$sign$ (\u ^T\mathbf{a}),$   sign$ (\v ^T\b )],$ is a two-bit hash, and $ \u ,\v\sim \mathcal{N}(0,I)$.


Second Solution: Embedded Hyperplane Hash

Intuition: Design Euclidean embedding after which minimizing distance is equivalent to minimizing $ \vert\mathbf{w}^T \mathbf{x}\vert$, making existing approx. NN methods applicable.

Definition: We define EH-Hash function family $ \mathcal{E}$ as:

$\displaystyle h_{\mathcal{E}}(\mathbf{z})= \begin{cases}h_{\u }\left(V(\mathbf{...
...{z})\right), &\text{if $\mathbf{z}$ is a query hyperplane vector,} \end{cases}$    

where $ V(\mathbf{a}) = {\text vec}(\mathbf{a}\mathbf{a}^T)=\left[a_1^2, a_1a_2, \dots, a_1a_d,  a_2^2, a_2a_3, \dots,  a_d^2\right]$ gives the embedding, and $ h_{\u }(\b )=$sign$ (\u ^T\b )$, with $ \u\in \Re^{d^2}$ sampled from $ \mathcal{N}(0,I)$.


Issue: $ V(\mathbf{a})$ is $ d^2$-dimensional, higher hashing overhead.

Solution: Compute $ h_{\u }(V(\mathbf{a}))$ approximately using randomized sampling:


H-Hash has faster pre-processing, but EH-Hash has stronger bounds.

  Accuracy Hashing insertion time
H-Hash: $ p_1=\frac{1}{4}-\frac{r}{\pi^2}$ $ \propto d$
EH-Hash: $ p_1 \ge 2 \left(\frac{1}{4}-\frac{r}{\pi^2}\right)$ $ \propto d^2$ ($ d$ with sampling)

Experimental Results

Goal: Show that proposed algorithms can select examples nearly as well as the exhaustive approach, but with substantially greater efficiency.

Image newsgroups1-crop Image newsgroups2-crop
Newsgroups: 20K documents, bag-of-words features.

Image cifar1-crop Image cifar2-crop
Tiny Images: 60K-1M images, Gist features.


Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning,
P. Jain, S. Vijayanarasimhan and K. Grauman, in NIPS 2010
[paper, supplementary]