The Idea

As Conan O'Brien has been winding down his TBS show this summer, he's been bringing back some of his favorite guests for a last hurrah. It got me wondering which late night hosts had the deepest bench, and, once I realized that episode names were listed in billing order, if we could establish a celebrity pecking order using Elo rankings.

All data was scraped from TVMaze.com, and comes from the 2015-2021 seasons -- when all of the following six hosts were on the air:

  • Conan O'Brien
  • Seth Meyers
  • Jimmy Kimmel
  • Jimmy Fallon
  • James Corden
  • Stephen Colbert

Here's a JavaScript snippet for scraping episode info from TVMaze.com, using the xpath functionality provided by Chrome's devtools.

copy({metadata: window.location.href,
data: $x("//section[contains(@class, 'season')]/article[contains(@class, 'episode-row')]").map(row => {
    cells = [...row.children].map((x, i) => i < 3 ? x.innerText : '')
    return {
        'episodeNumber': cells[0],
        'airDate': cells[1],
        'guests': cells[2]
    }
})})
In [1]:
import pandas as pd
import json
import pickle
import copy
import numpy as np
from scipy.spatial import distance
import matplotlib.pyplot as plt

Implementation of the Elo system below is courtesy of ddm7018 on Github.

In [50]:
class Elo:
    def __init__(self,k,g=1,homefield = 100):
        self.ratingDict     = {}    
        self.k              = k
        self.g              = g
        self.homefield      = homefield

    def addPlayer(self,name,rating = 1500):
        self.ratingDict[name] = rating
        
    def gameOver(self, winner, loser, winnerHome):
        if winnerHome:
            result = self.expectResult(self.ratingDict[winner] + self.homefield, self.ratingDict[loser])
        else:
            result = self.expectResult(self.ratingDict[winner], self.ratingDict[loser]+self.homefield)

        self.ratingDict[winner] = self.ratingDict[winner] + (self.k*self.g)*(1 - result)  
        self.ratingDict[loser]  = self.ratingDict[loser] + (self.k*self.g)*(0 - (1 -result))
        
    def expectResult(self, p1, p2):
        exp = (p2-p1)/400.0
        return 1/((10.0**(exp))+1)
In [2]:
%matplotlib notebook

Initial Setup (Skippable)

We'll do some initial setup -- each season of TV is a big blob of JSON with some metadata letting me know the URL that generated it. I hacked together a dictionary providing the relevant data for each of those URLs.

Then we loop through all the seasons and all their episodes, breaking the episode titles -- which contains all of the guests names -- into individual rows.

While we're here, we'll also run the Elo calculations. I decided to treat a three guest episode as a series of one-on-one matchups, in which the lead guest defeats the second guest, the second guest defeats the third guest, and the first guest then defeats the third guest as well.

In [270]:
# Initialize Elo league -- starting rating is 1500
pecking_order = Elo(k=20, g = 1, homefield = 0)

href_lookup = eval(open('host-season-lookup.pkl', 'r', encoding='utf-8').read())
print("href_lookup looks like:")
print(list(href_lookup.items())[:3])

json_array = json.load(open('data.json', 'r', encoding='utf-8'))
dfs = []
for season in json_array:
    records = []
    index = href_lookup[season['metadata']]
    episodes = season['data']
    for ep in episodes:
        if ep['episodeNumber'] == 's':
            continue # it's a special of some sort
        splitter = ';' if ';' in ep['guests'] else ','
        guests = [x.strip() for x in ep['guests'].split(splitter)]
        for guest in guests:
            # Ensure all guests are present in the Elo rankings
            if guest not in pecking_order.ratingDict:
                pecking_order.addPlayer(guest)
        for billing, current_guest in enumerate(guests):
            record = copy.deepcopy(ep)
            del record['guests']
            record['guest'] = current_guest
            record['billing'] = billing + 1
            records.append(record)
            # Once they are, have the current guest lose to all higher-ranked guests on that episode
            # The billing - 1 is to make the range exclusive on the right side.
            for guest_slot in range(0, billing-1):
                earlier_guest = guests[guest_slot]
                pecking_order.gameOver(earlier_guest, current_guest, True)
    df = pd.DataFrame(records)
    df['host'] = index[0]
    df['season'] = index[1]
    df.set_index(['season', 'host', 'episodeNumber'], inplace=True)
    dfs.append(df)
df = pd.concat(dfs)
print("\nSample episode datum:")
print(ep)
href_lookup looks like:
[('https://www.tvmaze.com/seasons/5103/conan-season-2015/episodes', ("Conan O'Brien", 2015)), ('https://www.tvmaze.com/seasons/31170/conan-season-2016/episodes', ("Conan O'Brien", 2016)), ('https://www.tvmaze.com/seasons/57612/conan-season-2017/episodes', ("Conan O'Brien", 2017))]

Sample episode datum:
{'episodeNumber': 'S', 'airDate': 'Jan 24, 2021', 'guests': 'AFC Championship Special'}
In [271]:
node_ids = pd.DataFrame(list(set(df.guest.values)))
node_ids.index.name = 'Id'
node_ids = node_ids.reset_index().set_index(0)
node_ids.index.name = 'Label'
inverse_node_ids = node_ids.reset_index().set_index('Id')

Data Exploration

In [272]:
guest_data = df.reset_index().groupby(['guest', 'host']).agg([np.size, np.mean])
appearances_per_show = guest_data.billing.sort_values('size', ascending=False).reset_index()
aps = appearances_per_show.pivot_table(values='size', index='guest', columns='host', margins=True, fill_value=0, margins_name='total', aggfunc=sum)
In [278]:
aps.sort_values('total', ascending=False).head(20)
Out[278]:
host Conan O'Brien James Corden Jimmy Fallon Jimmy Kimmel Seth Meyers Stephen Colbert total
guest
total 1998 2659 4072 3251 3336 3124 18440
Bernie Sanders 1 0 5 7 12 14 39
John Oliver 0 0 7 2 11 18 38
Bryan Cranston 4 5 8 5 3 10 35
Nick Kroll 6 3 7 5 6 5 32
Will Ferrell 5 4 10 3 5 3 30
Patton Oswalt 9 1 6 4 6 4 30
Jake Tapper 4 0 0 3 12 10 29
Anthony Anderson 3 3 7 9 3 4 29
Thomas Middleditch 9 6 1 1 5 6 28
Jim Gaffigan 6 4 3 2 7 6 28
Will Forte 4 4 7 3 8 1 27
Judd Apatow 6 4 6 3 4 4 27
Keegan-Michael Key 5 5 7 3 2 5 27
Seth Rogen 1 3 5 6 5 7 27
John Lithgow 3 3 7 2 4 7 26
John Mulaney 2 1 6 1 10 6 26
Tracy Morgan 5 3 5 6 6 1 26
Tig Notaro 8 3 8 0 1 6 26
Chelsea Handler 3 6 9 1 6 1 26

Kind of bizarre that Bernie Sanders is the most frequent late night guest for the last 7 years.

Anyway, the six hosts can be bucketed along a few dimensions, which are visible even in a cursory glance at this leaderboard of guest visits.

  • 3 of them have ties to SNL: Conan, Fallon, Meyers.
  • 3 are in New York City: Fallon, Meyers, Colbert.
  • 3 are in Los Angeles: Conan, Corden, Kimmel.

Then there are their various networks (TBS, CBS, ABC, NBC) and the conglomerates that own them -- Kimmel gets more of the Avengers because Disney owns ABC. Corden's English, so he's bound to get more Brits, and his producer Ben Winston has an in with the NBA somehow, so you'll see people like JJ Redick and Steph Curry show up alongside the more bread-and-butter guests, which are Broadway types.

It's also interesting to notice which guests like doing late night shows but shun certain hosts. There's probably two reasons for this:

1.) The guest doesn't want to do that person's show, because it's too small, or they don't like the host. 2.) The booker doesn't think the guest will work on their show.

Corden's show is an apolitical couch-hang in the style of Graham Norton -- Bernie Sanders isn't going to work in that format.

And I can't imagine Tig Notaro's sense of humor meshing with Jimmy Kimmel's.

Somebody like Bryan Cranston or Nick Kroll, on the other hand, will talk to anybody.

Lets try and figure out which celebrities fall where.

The Snubs & The Loyalists

Quantifying Guest Similarity

We can think of each row in our appearances table as a 6-dimensional vector, representing that celebrity's affinity for the various late night hosts. Using cosine similarity, we can figure out which guests have similar appearance patterns.

In [230]:
vectors = aps.join(node_ids).set_index('Id').drop(columns='total').dropna().sort_index().drop(index=[np.nan])
In [231]:
vectors
Out[231]:
Conan O'Brien James Corden Jimmy Fallon Jimmy Kimmel Seth Meyers Stephen Colbert
Id
0.0 0 0 1 1 1 1
1.0 0 0 0 0 4 0
2.0 0 0 0 1 0 0
3.0 0 0 0 0 1 0
4.0 0 2 0 1 1 1
... ... ... ... ... ... ...
5982.0 0 0 1 0 0 0
5983.0 0 1 0 0 0 0
5984.0 0 2 0 0 0 0
5985.0 0 1 0 0 0 1
5986.0 0 2 0 1 1 1

5987 rows × 6 columns

Now we'll tack on 13 synthetic guests.

  • Six will represent someone who appears equally on all but one of the late night shows, which they never appear on.
  • Another six will be loyalists, who only appear on one of the shows.
  • The last will be a social butterfly who's happy to show up anywhere.

First we'll put their ids into our lookup, then append them to the end of a numpy array. From there, we use scipy's distance functions to compute a squareform distance matrix -- that means that every guest will be compared against every other guest, using cosine similarity.

In [279]:
vectors.columns.values
Out[279]:
array(["Conan O'Brien", 'James Corden', 'Jimmy Fallon', 'Jimmy Kimmel',
       'Seth Meyers', 'Stephen Colbert'], dtype=object)
In [280]:
mapping = inverse_node_ids.to_dict()['Label']
max_key = max(mapping.keys())
for i, hated_host in enumerate(vectors.columns.values):
    key = max_key + 1 + i
    value = f"Snubs {hated_host}"
    mapping[key] = value

max_key = max(mapping.keys())
for i, loved_host in enumerate(vectors.columns.values):
    key = max_key + 1 + i
    value = f"Loves {loved_host}"
    mapping[key] = value
    
mapping[max(mapping.keys()) + 1] = "Social Butterfly"
In [281]:
snub_vectors = np.ones((6,6))
np.fill_diagonal(snub_vectors, 0)

love_vectors = np.zeros((6,6))
np.fill_diagonal(love_vectors, 1)

butterfly = np.ones((1,6))

data = vectors.to_numpy()
data = np.concatenate((data, snub_vectors, love_vectors, butterfly))
In [297]:
pdist = distance.pdist(data, 'cosine')
sims = pd.DataFrame(distance.squareform(pdist))
sims.index = sims.index.map(mapping)
sims.columns = sims.columns.map(mapping)
sims = sims.join(total_appearances)
freq_guests = sims[sims.total >= 10]
freq_guests = freq_guests.transpose()

Results

Spoiler -- there aren't many legit snubs in here. Either the vector I chose for cosine similarity isn't a great measure, or this type of pettiness is simply rare.

It is interesting to note some of the loyalists. For instance, Kimmel & Jimmy Fallon each have their own animal handler.

And there's a curious West Wing connection with the Corden show: he's had Aaron Sorkin, Allison Janney, and Bradley Whitford on quite a bit.

In [304]:
print("Maximum similarity == 0, Maximum dissimilarity == 1\n")
synth_df = freq_guests.reindex(list(mapping.values())[-13:])
for index, row in synth_df.iterrows():
    print(index, '\n')
    print(row.sort_values().head(20))
    print('\n***********\n')
Maximum similarity == 0, Maximum dissimilarity == 1

Snubs Conan O'Brien 

Anne Hathaway        0.010051
James Bay            0.016130
Bebe Rexha           0.016130
Hugh Grant           0.016130
Tiffany Haddish      0.018844
Carey Mulligan       0.020204
Henry Winkler        0.031754
Maren Morris         0.033908
Seth Rogen           0.034384
Charlize Theron      0.042159
Samuel L. Jackson    0.044659
Christian Slater     0.045208
Fall Out Boy         0.046537
Kelsea Ballerini     0.046537
Chris Pine           0.046537
Jessica Biel         0.046537
Chelsea Clinton      0.046537
Kane Brown           0.051317
Jim Parsons          0.052242
Michael Keaton       0.053271
Name: Snubs Conan O'Brien, dtype: float64

***********

Snubs James Corden 

Sebastian Maniscalco    0.017292
Abbi Jacobson           0.020204
Nick Kroll              0.033333
Daniel Radcliffe        0.046537
Josh Brolin             0.046537
Patton Oswalt           0.049053
Naomi Watts             0.067495
Emilia Clarke           0.067495
Kaley Cuoco             0.067495
Jennifer Lawrence       0.070330
Paul Bettany            0.070330
Jake Gyllenhaal         0.078557
Oscar Isaac             0.087129
Jack Black              0.087129
Andy Samberg            0.090412
Sarah Silverman         0.091261
Sarah Paulson           0.091312
Paul Rudd               0.092041
Adam Driver             0.092041
Julia Louis-Dreyfus     0.092885
Name: Snubs James Corden, dtype: float64

***********

Snubs Jimmy Fallon 

Ed Helms              0.009133
Jerrod Carmichael     0.031037
Eva Longoria          0.031754
Wanda Sykes           0.046537
Sean Hayes            0.046537
Patrick Stewart       0.053271
Jeff Goldblum         0.055001
Kumail Nanjiani       0.060851
Tracee Ellis Ross     0.068757
Regina Hall           0.069051
Amanda Peet           0.079642
Billy Eichner         0.079642
Jim Gaffigan          0.087129
Jason Sudeikis        0.094211
Thomas Middleditch    0.100000
Natasha Leggero       0.100000
X Ambassadors         0.100000
Josh Hutcherson       0.100000
Bishop Briggs         0.100000
Adam Pally            0.113407
Name: Snubs Jimmy Fallon, dtype: float64

***********

Snubs Jimmy Kimmel 

Jim Gaffigan              0.050614
Rashida Jones             0.051317
Cedric the Entertainer    0.051317
Judd Apatow               0.055001
Joel McHale               0.062242
Jennifer Lawrence         0.070330
John Lithgow              0.079642
Molly Shannon             0.079642
Joe Manganiello           0.079642
Keegan-Michael Key        0.083007
Anna Kendrick             0.086500
Jason Segel               0.086500
Jesse Tyler Ferguson      0.086741
Sarah Silverman           0.091261
Jason Sudeikis            0.094211
Dana Carvey               0.095466
Eva Longoria              0.096304
Nick Kroll                0.100000
Thomas Middleditch        0.100000
Ice Cube                  0.102451
Name: Snubs Jimmy Kimmel, dtype: float64

***********

Snubs Seth Meyers 

Aaron Paul            0.029857
Marc Maron            0.037860
Keegan-Michael Key    0.044799
Tony Hale             0.057837
Mila Kunis            0.060448
Leon Bridges          0.069051
Paul Bettany          0.070330
Kate Hudson           0.070330
Armie Hammer          0.070484
Bryan Cranston        0.074309
Chris Pratt           0.076619
Jeff Bridges          0.078557
Don Cheadle           0.083485
Bob Odenkirk          0.086500
Mark Wahlberg         0.086741
Paul Rudd             0.092041
Melissa McCarthy      0.094211
Judd Apatow           0.094376
Gary Clark Jr.        0.095466
Rob Lowe              0.096492
Name: Snubs Seth Meyers, dtype: float64

***********

Snubs Stephen Colbert 

Tracy Morgan            0.026876
Bill Hader              0.039841
Terry Crews             0.041486
Kane Brown              0.051317
Whitney Cummings        0.051317
Cobie Smulders          0.051317
Rashida Jones           0.051317
Lil Rel Howery          0.051317
Dave Franco             0.053271
Lucy Hale               0.053271
Will Forte              0.066052
Kaley Cuoco             0.067495
Lukas Graham            0.069051
Elizabeth Olsen         0.069739
Kate Hudson             0.070330
Don Cheadle             0.083485
Bob Odenkirk            0.086500
Jesse Tyler Ferguson    0.086741
Andy Samberg            0.090412
Sarah Silverman         0.091261
Name: Snubs Stephen Colbert, dtype: float64

***********

Loves Conan O'Brien 

Tom Papa             0.047421
Ron Funches          0.071523
Timothy Olyphant     0.090863
Kevin Nealon         0.092735
Jane Lynch           0.115348
Nikki Glaser         0.116548
Nicole Byer          0.133975
Bill Burr            0.152002
Sam Richardson       0.178005
Lisa Kudrow          0.180712
Chris Hardwick       0.197045
Megan Mullally       0.230200
Mark Normand         0.237507
Kristin Chenoweth    0.254644
Natasha Lyonne       0.262790
Pete Holmes          0.278005
Zach Woods           0.285565
Anna Faris           0.292893
Louie Anderson       0.292893
Kristen Schaal       0.292893
Name: Loves Conan O'Brien, dtype: float64

***********

Loves James Corden 

Ben Schwartz         0.012122
Jason Schwartzman    0.074180
Lily Tomlin          0.085009
Aaron Sorkin         0.116117
Allison Janney       0.148794
Max Greenfield       0.151472
Gordon Ramsay        0.170439
Niall Horan          0.191710
Bradley Whitford     0.191710
Josh Gad             0.195970
Judy Greer           0.198216
Alicia Keys          0.199359
Josh Groban          0.212161
Ben Platt            0.215535
January Jones        0.215535
Kate Beckinsale      0.230200
Adam Pally           0.237507
Kurt Russell         0.237507
Joel Edgerton        0.237999
Sharon Stone         0.244071
Name: Loves James Corden, dtype: float64

***********

Loves Jimmy Fallon 

Robert Irwin          0.000000
Dan White             0.000000
Fran Lebowitz         0.006116
Kate Upton            0.015268
Alex Rodriguez        0.023813
Michael Shannon       0.024100
Miley Cyrus           0.033012
Big Sean              0.043817
Jessica Alba          0.043817
Ariana Grande         0.046002
Alec Baldwin          0.050842
Hugh Jackman          0.059279
Michael Strahan       0.064240
Tyler Perry           0.071721
Lilly Singh           0.074180
Daveed Diggs          0.080855
Lin-Manuel Miranda    0.087129
Rita Ora              0.088678
Sienna Miller         0.088678
Ryan Reynolds         0.088678
Name: Loves Jimmy Fallon, dtype: float64

***********

Loves Jimmy Kimmel 

Dave Salmoni         0.000000
Viola Davis          0.051317
John Stamos          0.056120
DJ Khaled            0.064586
Snoop Dogg           0.072827
Zach Galifianakis    0.081441
Colin Farrell        0.085009
Chadwick Boseman     0.085009
Magic Johnson        0.088678
Justin Theroux       0.116117
Willie Nelson        0.142507
Lauren Cohan         0.166667
Channing Tatum       0.178005
Octavia Spencer      0.178005
Casey Affleck        0.178005
Jason Bateman        0.183503
Eric Andre           0.188893
Kobe Bryant          0.188893
Chris Hemsworth      0.197045
Kristen Bell         0.204505
Name: Loves Jimmy Kimmel, dtype: float64

***********

Loves Seth Meyers 

Jon Theodore        0.000000
Brann Dailor        0.000000
Mark Guiliana       0.000000
Jeremy Gara         0.000000
Taylor Schilling    0.051317
Wendy Williams      0.056120
Nicolle Wallace     0.095466
Chris Hayes         0.105573
Colin Jost          0.105573
Martha Stewart      0.127128
Aidy Bryant         0.151472
Kenan Thompson      0.161372
Elijah Wood         0.163340
Colin Quinn         0.170439
Michael Moore       0.171483
Taran Killam        0.178005
Ted Danson          0.178005
John Goodman        0.183503
Ike Barinholtz      0.215535
Stacey Abrams       0.231779
Name: Loves Seth Meyers, dtype: float64

***********

Loves Stephen Colbert 

John Dickerson                  0.000000
Laura Benanti                   0.004107
Jon Stewart                     0.004107
Jon Batiste                     0.004963
Triumph the Insult Comic Dog    0.015268
Neil deGrasse Tyson             0.053271
Gayle King                      0.056544
Samantha Bee                    0.076240
Tom Hanks                       0.082337
Jon Favreau                     0.082337
Joe Biden                       0.085009
Anderson Cooper                 0.113204
Steve Carell                    0.115348
Elizabeth Warren                0.119549
James Taylor                    0.133975
Lewis Black                     0.167950
Christine Baranski              0.191710
John Oliver                     0.193401
Rob Corddry                     0.209431
Leslie Odom Jr.                 0.215535
Name: Loves Stephen Colbert, dtype: float64

***********

Social Butterfly 

Nick Kroll            0.026271
Judd Apatow           0.029505
Rashida Jones         0.037750
Kane Brown            0.037750
Jason Sudeikis        0.042573
Paul Rudd             0.043635
Jerrod Carmichael     0.047421
Henry Winkler         0.057191
Eva Longoria          0.057191
Keegan-Michael Key    0.058267
Chris Pratt           0.063414
Emilia Clarke         0.063618
Kaley Cuoco           0.063618
Naomi Watts           0.063618
Gary Clark Jr.        0.064181
Christian Slater      0.066141
Jim Gaffigan          0.066667
Sarah Silverman       0.066743
Aaron Paul            0.070104
Paul Bettany          0.074180
Name: Social Butterfly, dtype: float64

***********

Elo Rankings

In [99]:
elo = pd.DataFrame(pecking_order.ratingDict.items(), columns=['guest', 'rating']).set_index('guest').sort_values('rating', ascending=False)
In [106]:
elo.rating.hist(bins=50)
Out[106]:
<matplotlib.axes._subplots.AxesSubplot at 0x215b0c80c50>

At the top of the rankings we've got a few different types of guests:

The pros:

  • The pros: David Spade, Martin Short
  • 30 Rock Alums: Tina Fey, Will Ferrell, Will Forte, Andy Samberg
  • Other talk show hosts: Chelsea Handler, Michael Strahan, Rachel Maddow, John Oliver, Jake Tapper, Joel McHale
  • Stars on the same network as the late night show: Allison Janney, James Spader, Anthony Anderson, Jim Parsons
  • America's favorite stars: Adam Sandler, the Breaking Bad boys, Margot Robbie, Jennifer Lopez, Kevin Bacon, Sam Jackson, Queen Latifah, Kevin Hart
In [306]:
elo.head(50)
Out[306]:
rating
guest
Bryan Cranston 1711.848495
James Spader 1703.215048
David Spade 1695.060902
Allison Janney 1694.220199
Keegan-Michael Key 1693.731601
Adam Sandler 1693.128107
Jon Favreau 1691.743497
Aaron Paul 1682.224610
Ethan Hawke 1681.323365
Tina Fey 1679.698980
Margot Robbie 1675.604065
Will Ferrell 1673.999778
Ricky Gervais 1671.251929
Will Smith 1670.955120
Michael Strahan 1670.601086
Kristen Bell 1668.861513
Bernie Sanders 1668.407999
Ice Cube 1667.854743
Andy Samberg 1666.565533
Anthony Anderson 1666.468296
Armie Hammer 1662.999874
Chelsea Handler 1662.527282
Jennifer Lopez 1659.958327
Kevin Bacon 1659.180172
Amy Adams 1658.298333
David Duchovny 1657.470689
Dr. Phil McGraw 1656.596703
Jim Parsons 1656.114500
Martin Short 1656.002147
Will Forte 1655.517357
Taraji P. Henson 1652.444117
Nick Offerman 1652.404197
Chris Pratt 1652.179327
Kevin Hart 1651.209151
Rachel Maddow 1650.654205
Samuel L. Jackson 1648.536374
John Oliver 1648.045654
Anna Kendrick 1646.139113
Alec Baldwin 1646.104553
Bill Hader 1645.502794
Queen Latifah 1645.495597
Jeff Daniels 1643.560541
Seth Rogen 1642.438739
Charlize Theron 1640.910137
Will Arnett 1640.856636
Joel McHale 1639.735609
Anderson Cooper 1639.267029
Tracy Morgan 1638.985590
Viola Davis 1638.917915
Jake Tapper 1638.839612

Spader's & Maddow's presences definitely feels mandated by NBC execs -- look where they show up:

In [309]:
aps.loc['James Spader']
Out[309]:
host
Conan O'Brien       0
James Corden        0
Jimmy Fallon       10
Jimmy Kimmel        0
Seth Meyers         9
Stephen Colbert     2
total              21
Name: James Spader, dtype: int64
In [317]:
aps.loc['Rachel Maddow']
Out[317]:
host
Conan O'Brien       0
James Corden        0
Jimmy Fallon       11
Jimmy Kimmel        0
Seth Meyers         8
Stephen Colbert     3
total              22
Name: Rachel Maddow, dtype: int64

The bottom is musical guests, who always close out the show. The very bottom are the drummers-in-residence that Seth Meyers would have on while Fred Armisen wasn't around -- these people would get last billing and stick around for a few weeks, which explains the Elo pummeling they took. Ditto for the Game of Thrones kids -- they would show up in big groups, which meant they'd "lose" to 8 or 9 people in a single appearance.

In [107]:
elo.tail(30)
Out[107]:
rating
guest
Leon Bridges 1404.203134
Bleachers 1403.706861
Isaac Hempstead Wright 1403.171595
Sophie Turner 1402.968809
Kelsea Ballerini 1401.080852
Lukas Graham 1400.078824
Dan White 1399.246779
Mark Normand 1397.383719
Maren Morris 1396.011745
Stanton Moore 1392.912157
Tove Lo 1391.590654
Kacey Musgraves 1390.567244
Nikki Glaspie 1389.710366
Chad Smith 1389.372027
Caitlin Kalafus 1389.161127
Lucius 1388.060259
Julia Michaels 1381.885999
Weezer 1381.011165
Atom Willard 1380.945116
Michel'Le Baptiste 1380.823804
Elle King 1377.031675
Ilan Rubin 1372.402110
Gary Clark Jr. 1369.605749
Allison Miller 1369.538791
Thaddeus Dixon 1369.218873
Abe Laboriel Jr. 1368.727831
Mark Guiliana 1337.010711
Jeremy Gara 1336.136429
Brann Dailor 1325.433136
Jon Theodore 1267.688960
In [320]:
elo_aps = elo.join(aps)
elo_aps = elo_aps[elo_aps.total > 15]
elo_aps.to_csv('elo.csv')

It's kind of interesting to check out frequent, non-musical guests who are hovering around the average of 1500 -- tells you that they're typically batting second. A lot of the people on here are just good at being guests -- Andrew Rannells, Bill Burr, Marc Maron, Jenny Slate, Pete Holmes -- but aren't famous enough to lead off a show.

In [323]:
elo_aps.tail(30)
Out[323]:
rating Conan O'Brien James Corden Jimmy Fallon Jimmy Kimmel Seth Meyers Stephen Colbert total
guest
Cobie Smulders 1551.853563 2 4 2 4 3 1 16
Judd Apatow 1550.884803 6 4 6 3 4 4 27
Neil deGrasse Tyson 1550.337844 2 3 0 0 1 11 17
Elizabeth Olsen 1545.833770 1 4 3 4 3 1 16
Michelle Obama 1544.581579 2 1 5 3 1 4 16
Leslie Jones 1543.857037 1 0 6 2 7 0 16
Andrew Rannells 1543.000035 1 6 8 0 7 3 25
Gabrielle Union 1536.373108 3 4 6 1 1 1 16
Bill Burr 1528.866474 8 0 2 4 2 1 17
Gwen Stefani 1528.253272 0 3 9 6 4 1 23
Blake Shelton 1527.890401 0 0 10 2 5 1 18
Elizabeth Warren 1525.698706 1 1 3 3 3 10 21
Alison Brie 1520.838386 0 5 5 3 4 1 18
Marc Maron 1518.910085 4 4 5 2 0 3 18
Kelly Clarkson 1516.841804 0 1 8 1 7 0 17
Kathryn Hahn 1515.804456 0 2 6 3 3 2 16
John Legend 1512.603555 0 2 8 2 2 2 16
Jenny Slate 1506.153366 1 7 4 4 3 1 20
Tig Notaro 1503.540767 8 3 8 0 1 6 26
Miley Cyrus 1502.471276 0 0 12 3 0 1 16
Zach Woods 1499.378025 7 6 0 1 3 1 18
Mike Birbiglia 1498.575928 1 2 7 4 4 3 21
Pete Holmes 1491.035841 7 5 0 2 0 4 18
Meghan Trainor 1442.086526 0 4 11 3 0 1 19
Shawn Mendes 1414.570159 1 8 8 0 0 0 17
Alessia Cara 1409.316346 0 2 8 3 3 1 17
Weezer 1381.011165 1 3 5 5 1 1 16
Elle King 1377.031675 0 4 2 2 6 2 16
Gary Clark Jr. 1369.605749 3 2 5 3 2 2 17
Jon Theodore 1267.688960 0 0 0 0 20 0 20

Preparing a Gephi visualization

Gephi can help explore social networks. To import data, you need to create CSVs with a particular format: first an edge list containing Source,Target, and then a nodes table with some extra info: in this case I'm tracking total appearances across shows, and the average billing order.

Click the image for the full resolution:
In [274]:
total_appearances = aps['total']
average_billing = df.groupby('guest').mean()
In [275]:
node_info = pd.concat([total_appearances, average_billing], axis=1)
In [276]:
edges = df.reset_index()[['guest', 'host']]
edges = edges.merge(node_ids, left_on='guest', right_index=True).merge(node_ids, left_on='host', right_index=True)
edges.rename(columns={'Id_x': 'Source', 'Id_y': 'Target'})[['Source', 'Target']].to_csv('edges.csv', index=False)
In [277]:
node_ids.join(node_info).reset_index()[['Id', 'Label', 'total', 'billing']].to_csv('nodes.csv', index=False)

Loose Ends (just for my reference)

In [122]:
avg_billing = appearances_per_show.pivot_table(values='mean', index='guest', columns='host', margins=True, fill_value=None, margins_name='total')
In [123]:
avg_billing['slot'] = avg_billing.total.round()
avg_billing['total_appearances'] = aps.total
In [124]:
avg_billing.sort_values(['slot', 'total_appearances'], ascending=[True, False]).head(25)
Out[124]:
host Conan O'Brien James Corden Jimmy Fallon Jimmy Kimmel Seth Meyers Stephen Colbert total slot total_appearances
guest
Bernie Sanders 1.000000 NaN 1.200000 1.142857 1.166667 1.500000 1.201905 1.0 39
John Oliver NaN NaN 1.285714 1.500000 1.000000 1.666667 1.363095 1.0 38
Bryan Cranston 1.000000 1.000000 1.000000 1.200000 1.000000 1.600000 1.133333 1.0 35
Will Ferrell 1.000000 1.000000 1.400000 1.666667 1.400000 1.666667 1.355556 1.0 30
Anthony Anderson 1.000000 2.333333 1.285714 1.000000 1.000000 1.000000 1.269841 1.0 29
Jim Gaffigan 1.000000 1.500000 2.000000 1.000000 1.142857 1.333333 1.329365 1.0 28
Keegan-Michael Key 1.200000 1.400000 1.142857 1.333333 1.000000 1.400000 1.246032 1.0 27
Seth Rogen 1.000000 1.666667 1.200000 1.000000 1.000000 1.428571 1.215873 1.0 27
Will Forte 1.000000 2.000000 1.000000 1.000000 1.000000 1.000000 1.166667 1.0 27
Chelsea Handler 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.0 26
John Lithgow 2.000000 1.000000 1.428571 1.000000 1.000000 1.571429 1.333333 1.0 26
John Mulaney 1.500000 2.000000 1.333333 1.000000 1.400000 1.500000 1.455556 1.0 26
Ricky Gervais 1.000000 NaN 1.000000 NaN 1.000000 1.142857 1.035714 1.0 26
Tracy Morgan 1.400000 1.000000 1.000000 1.500000 1.000000 1.000000 1.150000 1.0 26
Adam Sandler 1.000000 1.000000 1.000000 1.000000 1.000000 NaN 1.000000 1.0 25
Allison Janney 1.333333 1.090909 1.000000 1.000000 1.000000 1.250000 1.112374 1.0 25
Ice Cube 1.000000 1.400000 1.222222 1.000000 1.000000 1.000000 1.103704 1.0 25
Jeff Goldblum 1.000000 1.285714 2.000000 1.666667 1.000000 1.800000 1.458730 1.0 25
Joel McHale 1.000000 1.500000 1.000000 1.000000 1.000000 1.000000 1.083333 1.0 25
John Cena 2.000000 2.000000 1.454545 1.250000 1.250000 1.000000 1.492424 1.0 25
Kevin Hart 1.000000 1.000000 1.111111 1.000000 1.000000 1.000000 1.018519 1.0 25
Martin Short 1.000000 1.000000 1.300000 1.333333 1.000000 1.000000 1.105556 1.0 25
Bob Odenkirk 2.400000 1.200000 1.333333 1.285714 1.000000 1.500000 1.453175 1.0 24
Kevin Bacon 1.000000 1.400000 1.000000 1.000000 1.000000 1.000000 1.066667 1.0 24
Kristen Bell 1.000000 1.750000 1.000000 1.666667 1.333333 1.500000 1.375000 1.0 24
In [52]:
pcts = aps.div(aps.total, axis=0)
In [61]:
pcts.sort_values("Conan O'Brien", ascending=False).iloc[400:430]
Out[61]:
host Conan O'Brien James Corden Jimmy Fallon Jimmy Kimmel Seth Meyers Stephen Colbert total
guest
Oh Wonder 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
Olan Rogers 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
Calexico 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
Tanishq Abraham 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
Dan Naturman 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
clipping. 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
Milo Greene 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
Joe Lo Truglio 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
Frankie Muniz 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
Tavis Smiley 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
Scraps Show 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
Mayor Melvin Carter 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
Pierce the Veil 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
School of Rock 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
Milk Carton Kids 1.000000 0.000000 0.000 0.000000 0.00 0.000 1.0
Rory Scovel 0.857143 0.000000 0.000 0.142857 0.00 0.000 1.0
Tom Segura 0.800000 0.000000 0.000 0.000000 0.00 0.200 1.0
Melissa Rauch 0.800000 0.000000 0.000 0.000000 0.00 0.200 1.0
Brian Posehn 0.800000 0.000000 0.000 0.000000 0.20 0.000 1.0
Billy Gardell 0.750000 0.000000 0.000 0.000000 0.00 0.250 1.0
Jen Kirkman 0.750000 0.000000 0.000 0.000000 0.00 0.250 1.0
JB Smoove 0.750000 0.000000 0.125 0.000000 0.00 0.125 1.0
Seann William Scott 0.750000 0.000000 0.000 0.000000 0.25 0.000 1.0
Conleth Hill 0.750000 0.000000 0.000 0.000000 0.25 0.000 1.0
Nasim Pedrad 0.750000 0.000000 0.000 0.250000 0.00 0.000 1.0
Isaac Hempstead Wright 0.750000 0.000000 0.000 0.250000 0.00 0.000 1.0
Deon Cole 0.750000 0.000000 0.000 0.000000 0.00 0.250 1.0
Giancarlo Esposito 0.750000 0.000000 0.000 0.000000 0.25 0.000 1.0
Ron Funches 0.714286 0.285714 0.000 0.000000 0.00 0.000 1.0
Tom Papa 0.700000 0.000000 0.000 0.000000 0.10 0.200 1.0
In [22]:
avg_billing[(avg_billing.total > 2.0) & (avg_billing.total < 3.0)].tail(30)
Out[22]:
host Conan O'Brien James Corden Jimmy Fallon Jimmy Kimmel Seth Meyers Stephen Colbert total
guest
Van Jones 1.750000 NaN NaN 2.000000 2.50000 2.000000 2.062500
Viet Thanh Nguyen NaN NaN NaN NaN 2.50000 NaN 2.500000
Walter Isaacson NaN NaN NaN NaN NaN 2.333333 2.333333
Walton Goggins 1.500000 2.000000 NaN 4.000000 2.00000 2.000000 2.300000
Why Don't We NaN 3.000000 NaN 2.000000 NaN NaN 2.500000
Willie Nelson NaN NaN 1.500000 2.200000 2.00000 3.000000 2.175000
Winnie Harlow NaN NaN 2.500000 NaN NaN NaN 2.500000
Winston Duke NaN NaN NaN 3.000000 2.00000 NaN 2.500000
Wiz Khalifa NaN 1.000000 3.000000 2.500000 NaN NaN 2.166667
Wolf Alice 3.000000 2.000000 NaN NaN NaN 3.000000 2.666667
Wu-Tang Clan NaN NaN 3.000000 2.000000 NaN NaN 2.500000
Wyatt Cenac 2.000000 3.000000 2.333333 NaN 3.00000 2.500000 2.566667
X Ambassadors 3.000000 3.333333 3.000000 3.000000 3.00000 2.000000 2.888889
Yahya Abdul-Mateen II NaN 3.000000 NaN 2.000000 2.00000 2.000000 2.250000
Years & Years NaN 2.000000 2.000000 NaN 2.50000 NaN 2.166667
Yvette Nicole Brown NaN NaN NaN 2.000000 NaN 3.000000 2.500000
Zach Woods 2.285714 2.333333 NaN 2.000000 2.00000 2.000000 2.123810
Zazie Beetz NaN 3.000000 NaN 2.000000 2.00000 NaN 2.333333
Ziggy Marley NaN 2.666667 NaN NaN NaN 3.000000 2.833333
Zlatan Ibrahimovic NaN 2.000000 NaN 3.000000 NaN NaN 2.500000
Zlatan Ibrahimović NaN 3.000000 NaN 2.000000 NaN NaN 2.500000
Zoe Saldana NaN 2.000000 NaN 2.333333 NaN 2.000000 2.111111
Zoey Deutch 2.000000 2.000000 2.200000 2.000000 2.00000 3.000000 2.200000
guest host Kerry Washington NaN NaN NaN 2.500000 NaN NaN 2.500000
guest host Rob Lowe NaN NaN NaN 2.500000 NaN NaN 2.500000
guest host Sebastian Maniscalco NaN NaN NaN 2.500000 NaN NaN 2.500000
the Avett Brothers NaN NaN 3.000000 2.000000 3.00000 NaN 2.666667
the Band Perry NaN 3.000000 NaN 2.500000 NaN NaN 2.750000
the Mighty Mighty Bosstones NaN NaN NaN 2.500000 NaN NaN 2.500000
total 2.308189 2.125559 2.378996 2.280121 2.24204 2.183413 2.254749
In [232]:
aps.sort_values('total', ascending=False).to_csv("appearances_per_show_per_guest.csv")