content-based properties can be identified at low cost
(with no additional user effort and that people are in-
fluenced by these regularities make a compelling reason
to investigate how best to use them.
In what situations are ratings alone insufficient?
Social-filtering makes sense when there are enough
other users known to the system with overlapping
characteristics. Typically, the requirement for over-
lap in most of these systems is that the users of the
system rate the same items in order to be judged
similar/dissimilar to each other. It is dependent upon
the current state of the system -- the number of users
and the number and selection of movies that have been
rated.
As an example of the limitations of using ratings
alone, consider the case of an artifact for which no
ratings are available, such as when a new movie comes
out. Since there will be a period of time when a recom-
mendation system will have little ratings data for this
movie, the recommendation system will initially not
be able to recommend this movie reliably. However, a
system which makes use of content might be able to
make predictions for this movie even in the absence of
ratings.
In this paper, we present a new, inductive learn-
ing approach to recommendation. We show how pure
social-filtering can be accomplished using this ap-
proach, how the naive introduction of content-based
information does not help -- and indeed harms -- the
recommendation process, and finally, how the use of
hybrid features that combine elements of social and
content-based information makes it possible to achieve
more accurate recommendations. We use the problem
of movie recommendation as our exploratory domain
for this work since it provides a domain with a large
amount of data (over 45,000 movie evaluations across
more than 250 people), as well as a baseline social-
filtering method to which we can compare our results
(Hill, Stead, Rosenstein & Furnas 1995).
The Movie Recommendation Problem
As noted above, in the social filtering approach, a rec-
ommendation system is given as input a set of ratings
of specific artifacts for a particular user. In recom-
mending movies, for instance, this input would be a
set of movies that the user had seen, with some numer-
ical rating associated with each of these movies. The
output of the recommendation system is another set of
artifacts, not yet rated by the user, which the recom-
mendation system predicts the user will rate highly.
Social-filtering systems would solve this problem by
focusing solely on the movie ratings for each user, and
by computing from these ratings a function that can
give a rating to a user for a movie that others have
rated but the user has not. These systems have tradi-
tionally output ratings for movies, rather than a binary
label. They compute ratings for unseen objects by find-
ing similarities between peoples’ preferences about the
rated items. Similarity assessments are made amongst
individual users of a system and are computed using
a variety of statistical techniques. For example, Rec-
ommender computes for a user a smaller group of ref-
erence users known as recommenders. These recom-
menders are other members of the community most
similar to the user. Using regression techniques, these
recommenders’ ratings are then used to predict rat-
ings for new movies. In this social recommendation
approach recommended movies are usually presented
to the user as a rank-ordered list.
Content-based recommendation systems, on the
other hand, would reflect solely the non-ratings infor-
mation. For each user they would take a description of
each liked and disliked movie, and learn a procedure
that would take the description of a new movie and
predict whether it will be liked or disliked by the user.
For each user a separate recommendation procedure
would be used.
Our Approach
The goal of our work is to develop an approach to
recommendation that can exploit both ratings and
content information. We depart from the traditional
social-filtering approach to recommendation by fram-
ing the problem as one of classification, rather than
artifact rating. On the other hand, we differ from
content-based filtering methods in that social informa-
tion, in the form of other users’ ratings, will be used
in the inductive learning process.
In particular, we will formalize the movie recommen-
dation problem as a learning problem--specifically, the
problem of learning a function that takes as its input a
user and a movie and produces as output a label indi-
cating whether the movie would be liked (and therefore
recommended) or disliked:
f( (user, moviel ) --. {liked, disliked}
As a problem in classification, we also are interested
in predicting whether a movie is liked or disliked, not
an exact rating. Our output is also not an ordered list
of movies, but a set of movies which we predict will
be liked by the user. Most importantly, we are now
able to generalize our inputs to the problem to other
information describing both users and movies.
The information we have available for this process
is a collection of user/movie ratings (on a scale of
1-10), and certain additional information concerning
each movie.
1
To present the results as sets of movies
predicted to be liked or disliked by a user we compute a
ratings threshold for each user such that 1/4 of all the
user’s ratings exceed and the remaining 3/4 do not, and
we return as recommended any movie whose predicted
rating is above the training-data-based threshold on
movies.
lit would be desirable to make the recommendation pro-
cess a function of user attributes such as age or gender, but
since that information is not available in the data we are
using in this paper, we are forced to neglect it here.