the hosts who offer them. The data collection process spanned a period of approximately five years,
from mid-2012 to the end of 2016. We performed scrapes at irregular intervals between 2012 to
2014 and at a weekly interval starting January 2015.
12
Our scraping algorithm collected all listing information available to users of the website, in-
cluding the property location, the daily price, the average star rating, a list of photos, the guest
capacity, the number of bedrooms and bathrooms, a list of amenities such as WiFi and air condi-
tioning, etc., and the list of all reviews from guests who have stayed at the property.
13
Airbnb host
information includes the host name and photograph, a brief profile description, and the year-month
in which the user registered as a host on Airbnb.
Our final dataset contains detailed information about 1,097,697 listings and 682,803 hosts span-
ning a period of nine years, from 2008 to 2016. Because of Airbnb’s dominance in the home-sharing
market, we believe that this data represents the most comprehensive picture of home-sharing in
the U.S. ever constructed for independent research.
4.3 Calculating the number of Airbnb listings, 2008-2016
Once we have collected the data, the next step is to define a measure of Airbnb supply. This task
requires two choices: First, we need to choose the geographic granularity of our measure; second,
we need to define the entry and exit dates of each listing in the Airbnb platform. Regarding
the geographic aggregation, we conduct our main analysis at the zipcode level for a few reasons.
First, it is the lowest level of geography for which we can reliably assign listings without error
(other than user input error).
14
Second, neighborhoods are a natural unit of analysis for housing
markets because there is significant heterogeneity in housing markets across neighborhoods within
cities but comparatively less heterogeneity within neighborhoods. Zipcodes will be our proxy for
neighborhoods. Third, conducting the analysis at the zipcode level as opposed to the city level helps
with identification. This is due to our ability to compare zipcodes within cities, thus controlling for
any unobserved city level factors that may be unrelated to Airbnb but that affect all neighborhoods
12
In their paper, Horn and Merante (2017) incorrectly state that our Airbnb dataset comes form InsideAirbnb.com
(probably referencing an older version of this paper), but, in fact, the current results are based on data that one of
the authors of this paper scraped and collected.
13
Airbnb does not reveal the exact street address or coordinates of the property for privacy reasons; however, the
listing’s city, street, and zipcode correspond to the property’s real location.
14
Airbnb does report the latitude and longitude of each property but only up to a perturbation of a few hundred
meters. So it would be possible, but complicated, to aggregate the listings to finer geographies with some error.
14
Electronic copy available at: https://ssrn.com/abstract=3006832