data.tgz contains: 1. Image urls and their metadata (tags, descriptions, etc.) from flickr 2. News articles from New York Times in the year 1900 to 2016, in stemmed bag of words format 3. Inferred topic distributions for news articles and top words in topics