PubFig83 + LFW Dataset

Task: find George Clooney and Angelina Jolie pictures online

The PubFig83 + LFW dataset is the combination of PubFig83 and the LFW datasets to form a new benchmark dataset for open-universe face identification. Based on the realistic scenarios of automatically searching for people in web photos or tagging friends and family in personal photo albums, the purpose of the dataset is to allow algorithms to find and identity some individuals while ignoring all others as background, or distractor, faces. This mimics many real-world applications where face recognition needs to ignore many background faces that appear in photos, but are not relevant to the user. PubFig83+LFW has 13,002 faces representing 83 individuals from PubFig83, divided into 2/3 training (8720 faces) and 1/3 testing set (4,282  faces). From LFW, 12,066 faces representing over 5,000 images are used as a distractor set.

Paper

If you want to use our dataset, please cite our CVPR2013 workshop paper:

Becker, B. C. and Ortiz, E.G. “Evaluating Open-Universe Face Identification on the Web,” In CVPR 2013, Analysis and Modeling of Faces and Gestures Workshop.

Download Dataset

The PubFig83+LFW dataset comes in several parts:

  • Matlab Face Recognition Toolbox (): A matlab toolbox that has our algorithm LASRC along with many other recent face recognition algorithms and an extensible way to plug in your own algorithm for evaluation. NOTE: The toolbox is slightly upgraded from what was used to generate the paper results (in the paper we used a decimation before PCA which is omitted in this version of the toolbox), which yields slightly better results on all algorithms.
  • Aligned Face Images (409 MB): Faces from PubFig83 and LFW, organized into directory structures of train, test, and distract. Faces are are pre-aligned by the eye positions as reported by PittPatt-reported fiducials and filtered to reduce the chance of algorithms not finding the face suitable (i.e. only faces that PittPatt, SHORE, Google Picasa, MS Photo Gallery, and Apple iPhoto found were kept).
  • Features Vectors (182 MB): HOG, LBP, and Gabor wavelet features extracted from the aligned face images, concatenated, and reduced to 2048 dimensions with PCA. For our paper, we only used the first 1536 dimensions of the descriptors.
  • Raw Face Images (381 MB): minimally modified faces from PubFig83 and LFW. Faces have been resized to 250×250 and filtered to reduce the chance of algorithms not finding the face suitable (i.e. only faces that PittPatt, SHORE, Google Picasa, MS Photo Gallery, and Apple iPhoto found were kept). Note PubFig and LFW use different cropping, so you must perform your own processing to bring the datasets into alignment.

Reported Results

We present the best results we know of on our dataset, both PR curves along with tables of precision at 95% recall and average precision. We encourage authors to contact brian@briancbecker.com so we can showcase your results along with published papers here in this space too. For more details, see our paper above.

Research Algorithms

Client-Side Libraries

Cloud-Based APIs

Consumer Applications