================================================================================
    UT Zappos50K Shoe Dataset (ver 1.2)
================================================================================

UT Zappos50K is a large shoe dataset consisting of 50,025 catalog images 
collected from Zappos.com. This dataset is created in the context of an online 
shopping task, where users care specifically about fine-grained visual 
differences. The images are mostly 136 x 102 pixels. The shoes are centered on 
a white background and pictured in the same orientation for convenient analysis.

The images are stored in a tree structure. The relative paths to the images are 
stored in the matrix below:

  image-path.mat :: Cell array containing the relative path to each image.

  
This dataset is created for our work on fine-grained attribute comparison and 
is made available for non-commercial use only. If you use this dataset in a 
publication, please cite the following paper:

  Aron Yu and Kristen Grauman. "Fine-Grained Visual Comparisons with Local 
  Learning". In CVPR, 2014.

BibTeX:

@inproceedings {fine-grained,
  author = {A. Yu and K. Grauman},
  title = {{F}ine-{G}rained {V}isual {C}omparisons with {L}ocal {L}earning},
  booktitle = {Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2014}
}

--------------------------------------------------------------------------------
    | Image Features
--------------------------------------------------------------------------------

  zappos-gist.mat :: 960-dimensional GIST descriptors (50025 x 960).

  zappos-color.mat :: 30-dimensional LAB color histogram (50025 x 30).


--------------------------------------------------------------------------------
    | Relative Attributes Labels
--------------------------------------------------------------------------------

  zappos-labels.mat :: Regular comparison label matrix (UT-Zap50K-1).

  zappos-labels-fg.mat :: Fine-grained comparison label matrix (UT-Zap50K-2).


There are human annotations from Amazon Mechanical Turk (mTurk) for 4 relative 
attributes: "open", "pointy at the toe", "sporty", and "comfort".

Columns of the label matrix (N x 6):
  (1) Image index of the first image.
  (2) Image index of the second image.
  (3) Attribute ID: 1 = open, 2 = pointy, 3 = sporty, 4 = comfort.
  (4) Comparison label: 1 = 1st image has *more* attribute than 2nd image
                        2 = 1st image has *less* attribute than 2nd image
                        3 = both images have the attribute equally
  (5) Average confidence score: 1 = very confident
                                2 = somewhat confident
                                3 = not confident
  (6) Agreement score: Percent of mTurk workers who gave the majority vote.
  
Images are indexed w.r.t. their ordering in the image path vector. (e.g. Image 
#1 has CID 100627-72.) Each pair of images is labeled by 5 unique mTurk workers. 
Comparison labels are assigned based on majority vote and the confidence scores 
are averaged. Pairs with low confidence (> 2) and low agreement (< 0.5) have 
been pruned.

  e.g. 44836 5597 3 2 1 1
  Image #44836 is less "sporty" than Image #5597. All workers are very 
  confident in their decisions and are in full agreement.

The labels are divided s.t. each cell contains the label matrix for a single 
attribute. Each row in the matrix represent a single pair of shoe images with 
its corresponding attribute.

Note: Label matrices are best viewed in Matlab using 'format short g'.


--------------------------------------------------------------------------------
    | Meta-data Labels
--------------------------------------------------------------------------------

  meta-data.csv :: Raw enum labels taken from Zappos.com.

  meta-data.mat :: Cell array containing the enum labels.

  meta-data-bin.csv :: Binary labels expanded from enum labels. Each type 
                       within an enum has its own label. 


Each row in the CSV file represents a different shoe. There are 8 possible 
enum labels and they are described below. Each shoe is not guaranteed to have
all labels, except for the labels Category and SubCategory. Some shoes can have
multiple types within a single enum label, separated by semicolons.

  CID: Image ID in the form of ProductID-ColorID, where ProductID is the 
       Zappos product identifier. There can be several shoes of the same kind
       (same ProductID) but in different colors.

  Category: Broad category name assigned by Zappos.

  SubCategory: Specific sub-category name assigned by Zappos.

  HeelHeight [opt]: Height of the heel. 
        
  Insole [opt]: Materials used to make the insole. 

  Closure [opt]: Mechanisms to enclose the foot.

  Gender [opt]: Recommended gender for the shoe.

  Material [opt]: Materials used to make the shoe.

  ToeStyle [opt]: Style at the front of the shoe.


--------------------------------------------------------------------------------
    | Fine-Grained Rationales
--------------------------------------------------------------------------------

  zappos-fg-rationale.mat :: Raw labels for each pair including rationales.


In addition to the comparison labels, we also collected annotator rationales 
for the fine-grained pairs. Each mTurk worker must give a 1 sentence/phrase 
reasoning on *why* he/she made the particular fine-grained decision (also acts 
as quality control). Each fine-grained pair has rationales from 5 workers. 
Each supervision is presented individually (instead of averaged per pair) in the
same format as above.


--------------------------------------------------------------------------------
    | Train-Test Splits
--------------------------------------------------------------------------------

  train-test-splits :: Indexes of the training and testing sets w.r.t. the 
                       provided comparison labels. The first level of cells 
                       represents attributes and the second level of cells 
                       represents the individual splits.

  train-test-splits-pair :: Same format as above. Actual supervision pairs 
                            instead of indexes.


We provide the train/test splits used to produce the results in our paper. We
also provide a demo script to help with experimental setup for a single split 
on a single attribute. There are 10 splits for each of the 4 attributes. Please 
insert you own learning functions into the scripts.


--------------------------------------------------------------------------------
    | Download Contents
--------------------------------------------------------------------------------

  ut-zap50k-data.zip :: All relevant data for this dataset.
  
  ut-zap50k-feats.zip :: GIST + Color features for each image.
  
  ut-zap50k-images.zip :: Full set of 50,025 colored shoe images.
  

--------------------------------------------------------------------------------
    | Contact Info
--------------------------------------------------------------------------------

For a detailed usage of this dataset, please refer to our paper and visit our 
project webpage at vision.cs.utexas.edu/projects/finegrained

If you have any questions, please contact Aron Yu at aron.yu@utexas.edu


--------------------------------------------------------------------------------
    | Acknowledgement
--------------------------------------------------------------------------------

We thank Mark Stephenson for his help creating this dataset.