2章 その12

p.27より、
ミネソタ大のGroupLens Projectによって作られた映画の評価データセットを使う。

MovieLens(http://www.grouplens.org/node/73)

mkdir data
mkdir data/movielens
cd data/movielens
wget http://www.grouplens.org/system/files/ml-data_0.zip
unzip ml-data_0.zip

10万件の方のデータを./data/movielens以下に置いた。



読み込むための関数を recommendations.rb に書く
http://www.bitbucket.org/shokai/collective-intelligence-study/src/e6e7d1aa44e1/recommendations.rb
p.28より

  def loadMovieLens(path='./data/movielens')
    # 映画のタイトルを得る
    movies = Hash.new
    open(path+'/u.item').each{ |file|
      file.each{ |line|
        (id,title) = line.split('|')[0..1]
        movies[id] = title
      }
    }
    
    # データの読み込み
    prefs = Hash.new
    open(path + '/u.data').each{ |file|
      file.each{ |line|
        (user,movieid,rating,ts) = line.split("\t") # TAB区切り
        prefs[user] = Hash.new if !prefs.key?(user)
        prefs[user][movies[movieid]] = rating.to_f
      }
    }
    return prefs
  end


irbで試す。まず準備

>> require 'pp'
=> true
>> require 'recommendations.rb'
=> true
>> c = Critics.new
=> #<Critics:0x66cd08 @users={"Jack Matthews"=>{"The Night Listener"=>3.0, "Superman Returns"=>5.0, "Lady in the Water"=>3.0, "Snake on a Plane"=>4.0, "You, Me and Dupree"=>3.5}, "Gene Seymour"=>{"The Night Listener"=>3.0, "Superman Returns"=>5.0, "Lady in the Water"=>3.0, "Snake on a Plane"=>3.5, "You, Me and Dupree"=>3.5, "Just My Luck"=>1.5}, "Mick LaSalle"=>{"The Night Listener"=>3.0, "Superman Returns"=>3.0, "Lady in the Water"=>3.0, "Snake on a Plane"=>4.0, "You, Me and Dupree"=>2.0, "Just My Luck"=>2.0}, "Toby"=>{"Superman Returns"=>4.0, "Snake on a Plane"=>4.5, "You, Me and Dupree"=>1.0}, "Claudia Puig"=>{"The Night Listener"=>4.5, "Superman Returns"=>4.0, "Snake on a Plane"=>3.5, "You, Me and Dupree"=>2.5, "Just My Luck"=>3.0}, "Lisa Rose"=>{"The Night Listener"=>3.0, "Superman Returns"=>3.5, "Lady in the Water"=>2.5, "Snake on a Plane"=>3.5, "You, Me and Dupree"=>2.5, "Just My Luck"=>3.0}, "Michael Phillips"=>{"The Night Listener"=>4.0, "Superman Returns"=>3.5, "Lady in the Water"=>2.5, "Snake on a Plane"=>3.0}}>


movielensの評価を読み込み

>> prefs = c.loadMovieLens()

長いのでprefsの中身は省略



ユーザベースの推薦を行う。
87番のユーザへの映画の推薦を30件

>> pp c.getRecommendations(prefs, '87')[0..30]
[{5.0=>"Entertaining Angels: The Dorothy Day Story (1996)"},
 {4.89884443128923=>"Legal Deceit (1997)"},
 {4.81501908224271=>"Letter From Death Row, A (1998)"},
 {4.73210829839414=>"Hearts and Minds (1996)"},
 {4.69624446649087=>"Pather Panchali (1955)"},
 {4.65239706102676=>"Lamerica (1994)"},
 {4.53872369347481=>"Leading Man, The (1996)"},
 {4.5350813391061=>"Mrs. Dalloway (1997)"},
 {4.53233761257298=>"Innocents, The (1961)"},
 {4.52799857474708=>"Casablanca (1942)"},
 {4.51027014971986=>"Everest (1998)"},
 {4.49396775542844=>"Dangerous Beauty (1998)"},
 {4.48515130180134=>"Wallace & Gromit: The Best of Aardman Animation (1996)"},
 {4.46328746129022=>"Wrong Trousers, The (1993)"},
 {4.45097943694103=>"Kaspar Hauser (1993)"},
 {4.43107907117952=>"Usual Suspects, The (1995)"},
 {4.42752068286496=>"Maya Lin: A Strong Clear Vision (1994)"},
 {4.41487078459208=>"Wedding Gift, The (1994)"},
 {4.37744525265646=>"Affair to Remember, An (1957)"},
 {4.37607111044777=>"Good Will Hunting (1997)"},
 {4.37601109900139=>"As Good As It Gets (1997)"},
 {4.37414617950098=>"Anna (1996)"},
 {4.3674372665046=>"Close Shave, A (1995)"},
 {4.35749999413449=>"Quiet Room, The (1996)"},
 {4.34300367270454=>"Rear Window (1954)"},
 {4.33902330272137=>"Some Folks Call It a Sling Blade (1993)"},
 {4.32933764565644=>"Silence of the Lambs, The (1991)"},
 {4.32762689268298=>"Titanic (1997)"},
 {4.32603958144363=>"Angel Baby (1995)"},
 {4.32048262503413=>"12 Angry Men (1957)"},
 {4.29835808870139=>"One Flew Over the Cuckoo's Nest (1975)"}]
=> nil

150番のユーザへの推薦を30件

>> pp c.getRecommendations(prefs, '150')[0..30]
[{5.0=>"Year of the Horse (1997)"},
 {4.9166073761723=>"Horse Whisperer, The (1998)"},
 {4.75436201814768=>"Pather Panchali (1955)"},
 {4.69074952511793=>"Paths of Glory (1957)"},
 {4.68910330808147=>"Duoluo tianshi (1995)"},
 {4.62728716563422=>"For Whom the Bell Tolls (1943)"},
 {4.56397449120659=>"Braindead (1992)"},
 {4.52242528502259=>"Meet John Doe (1941)"},
 {4.50087864855038=>"Close Shave, A (1995)"},
 {4.48430005348334=>"Casablanca (1942)"},
 {4.48350194027013=>"Wrong Trousers, The (1993)"},
 {4.47051328646972=>"Empire Strikes Back, The (1980)"},
 {4.45673307177153=>"Flirt (1995)"},
 {4.44786936549563=>"Four Days in September (1997)"},
 {4.44073219146591=>"Schindler's List (1993)"},
 {4.42155110148068=>"Shawshank Redemption, The (1994)"},
 {4.41313936387175=>"Anna (1996)"},
 {4.40611002274868=>"Wallace & Gromit: The Best of Aardman Animation (1996)"},
 {4.36757178073924=>"Infinity (1996)"},
 {4.36591223004216=>"Raiders of the Lost Ark (1981)"},
 {4.35177367085383=>"Rear Window (1954)"},
 {4.33271794575692=>"Cinema Paradiso (1988)"},
 {4.33168712420195=>"12 Angry Men (1957)"},
 {4.32551778838972=>"Top Hat (1935)"},
 {4.32318212065411=>"Silence of the Lambs, The (1991)"},
 {4.32147281082258=>"Third Man, The (1949)"},
 {4.3214132692368=>"Roommates (1995)"},
 {4.30944843988461=>"Usual Suspects, The (1995)"},
 {4.30226544810509=>"To Kill a Mockingbird (1962)"},
 {4.30216852986268=>"Manchurian Candidate, The (1962)"},
 {4.30115810920849=>"His Girl Friday (1940)"}]
=> nil