2章 その12
p.27より、
ミネソタ大のGroupLens Projectによって作られた映画の評価データセットを使う。
MovieLens(http://www.grouplens.org/node/73)
mkdir data mkdir data/movielens cd data/movielens wget http://www.grouplens.org/system/files/ml-data_0.zip unzip ml-data_0.zip
10万件の方のデータを./data/movielens以下に置いた。
読み込むための関数を recommendations.rb に書く
http://www.bitbucket.org/shokai/collective-intelligence-study/src/e6e7d1aa44e1/recommendations.rb
p.28より
def loadMovieLens(path='./data/movielens') # 映画のタイトルを得る movies = Hash.new open(path+'/u.item').each{ |file| file.each{ |line| (id,title) = line.split('|')[0..1] movies[id] = title } } # データの読み込み prefs = Hash.new open(path + '/u.data').each{ |file| file.each{ |line| (user,movieid,rating,ts) = line.split("\t") # TAB区切り prefs[user] = Hash.new if !prefs.key?(user) prefs[user][movies[movieid]] = rating.to_f } } return prefs end
irbで試す。まず準備
>> require 'pp' => true >> require 'recommendations.rb' => true >> c = Critics.new => #<Critics:0x66cd08 @users={"Jack Matthews"=>{"The Night Listener"=>3.0, "Superman Returns"=>5.0, "Lady in the Water"=>3.0, "Snake on a Plane"=>4.0, "You, Me and Dupree"=>3.5}, "Gene Seymour"=>{"The Night Listener"=>3.0, "Superman Returns"=>5.0, "Lady in the Water"=>3.0, "Snake on a Plane"=>3.5, "You, Me and Dupree"=>3.5, "Just My Luck"=>1.5}, "Mick LaSalle"=>{"The Night Listener"=>3.0, "Superman Returns"=>3.0, "Lady in the Water"=>3.0, "Snake on a Plane"=>4.0, "You, Me and Dupree"=>2.0, "Just My Luck"=>2.0}, "Toby"=>{"Superman Returns"=>4.0, "Snake on a Plane"=>4.5, "You, Me and Dupree"=>1.0}, "Claudia Puig"=>{"The Night Listener"=>4.5, "Superman Returns"=>4.0, "Snake on a Plane"=>3.5, "You, Me and Dupree"=>2.5, "Just My Luck"=>3.0}, "Lisa Rose"=>{"The Night Listener"=>3.0, "Superman Returns"=>3.5, "Lady in the Water"=>2.5, "Snake on a Plane"=>3.5, "You, Me and Dupree"=>2.5, "Just My Luck"=>3.0}, "Michael Phillips"=>{"The Night Listener"=>4.0, "Superman Returns"=>3.5, "Lady in the Water"=>2.5, "Snake on a Plane"=>3.0}}>
movielensの評価を読み込み
>> prefs = c.loadMovieLens()
長いのでprefsの中身は省略
ユーザベースの推薦を行う。
87番のユーザへの映画の推薦を30件
>> pp c.getRecommendations(prefs, '87')[0..30] [{5.0=>"Entertaining Angels: The Dorothy Day Story (1996)"}, {4.89884443128923=>"Legal Deceit (1997)"}, {4.81501908224271=>"Letter From Death Row, A (1998)"}, {4.73210829839414=>"Hearts and Minds (1996)"}, {4.69624446649087=>"Pather Panchali (1955)"}, {4.65239706102676=>"Lamerica (1994)"}, {4.53872369347481=>"Leading Man, The (1996)"}, {4.5350813391061=>"Mrs. Dalloway (1997)"}, {4.53233761257298=>"Innocents, The (1961)"}, {4.52799857474708=>"Casablanca (1942)"}, {4.51027014971986=>"Everest (1998)"}, {4.49396775542844=>"Dangerous Beauty (1998)"}, {4.48515130180134=>"Wallace & Gromit: The Best of Aardman Animation (1996)"}, {4.46328746129022=>"Wrong Trousers, The (1993)"}, {4.45097943694103=>"Kaspar Hauser (1993)"}, {4.43107907117952=>"Usual Suspects, The (1995)"}, {4.42752068286496=>"Maya Lin: A Strong Clear Vision (1994)"}, {4.41487078459208=>"Wedding Gift, The (1994)"}, {4.37744525265646=>"Affair to Remember, An (1957)"}, {4.37607111044777=>"Good Will Hunting (1997)"}, {4.37601109900139=>"As Good As It Gets (1997)"}, {4.37414617950098=>"Anna (1996)"}, {4.3674372665046=>"Close Shave, A (1995)"}, {4.35749999413449=>"Quiet Room, The (1996)"}, {4.34300367270454=>"Rear Window (1954)"}, {4.33902330272137=>"Some Folks Call It a Sling Blade (1993)"}, {4.32933764565644=>"Silence of the Lambs, The (1991)"}, {4.32762689268298=>"Titanic (1997)"}, {4.32603958144363=>"Angel Baby (1995)"}, {4.32048262503413=>"12 Angry Men (1957)"}, {4.29835808870139=>"One Flew Over the Cuckoo's Nest (1975)"}] => nil
150番のユーザへの推薦を30件
>> pp c.getRecommendations(prefs, '150')[0..30] [{5.0=>"Year of the Horse (1997)"}, {4.9166073761723=>"Horse Whisperer, The (1998)"}, {4.75436201814768=>"Pather Panchali (1955)"}, {4.69074952511793=>"Paths of Glory (1957)"}, {4.68910330808147=>"Duoluo tianshi (1995)"}, {4.62728716563422=>"For Whom the Bell Tolls (1943)"}, {4.56397449120659=>"Braindead (1992)"}, {4.52242528502259=>"Meet John Doe (1941)"}, {4.50087864855038=>"Close Shave, A (1995)"}, {4.48430005348334=>"Casablanca (1942)"}, {4.48350194027013=>"Wrong Trousers, The (1993)"}, {4.47051328646972=>"Empire Strikes Back, The (1980)"}, {4.45673307177153=>"Flirt (1995)"}, {4.44786936549563=>"Four Days in September (1997)"}, {4.44073219146591=>"Schindler's List (1993)"}, {4.42155110148068=>"Shawshank Redemption, The (1994)"}, {4.41313936387175=>"Anna (1996)"}, {4.40611002274868=>"Wallace & Gromit: The Best of Aardman Animation (1996)"}, {4.36757178073924=>"Infinity (1996)"}, {4.36591223004216=>"Raiders of the Lost Ark (1981)"}, {4.35177367085383=>"Rear Window (1954)"}, {4.33271794575692=>"Cinema Paradiso (1988)"}, {4.33168712420195=>"12 Angry Men (1957)"}, {4.32551778838972=>"Top Hat (1935)"}, {4.32318212065411=>"Silence of the Lambs, The (1991)"}, {4.32147281082258=>"Third Man, The (1949)"}, {4.3214132692368=>"Roommates (1995)"}, {4.30944843988461=>"Usual Suspects, The (1995)"}, {4.30226544810509=>"To Kill a Mockingbird (1962)"}, {4.30216852986268=>"Manchurian Candidate, The (1962)"}, {4.30115810920849=>"His Girl Friday (1940)"}] => nil