So… What Makes a MLB All-Star?

The first “All-Star Game” in the history of professional sports was a baseball exhibition during the 1933 World’s Fair.  Since then, the “Midsummer Classic” has become an annual tradition, with only an interruption in 1945 due to World War II.  Although ostensibly an exhibition game for fans, it has become an important negotiation tool for players.  Free agents often use All-Star Game appearances to leverage more money from their teams.  Players that make All-Star Games can also gain additional endorsement opportunities, especially at the national level.  As such, making an All-Star Game is an important achievement for the financial prospects of professional baseball players.  Beyond their careers, players often use All-Star appearances as marketing tools to secure post-playing endorsements, much like Stan Ross.

The Original 1933 MLB All-Star Team.

The Original 1933 MLB All-Star Team.

But with All-Stars being selected with no defined guidelines, what factors go into being selected to the game?  I attempted to find the most significant variables that led to All-Star Game appearances by looking at data from every position player with over 200 plate appearances in 2014, enough to qualify a player as “full-time”.  This equated to 383 players, of whom 47 were selected to the All-Star Game.  The independent variables used were games played, plate appearances, runs, hits, doubles, triples, home runs, runs batted in, stolen bases, times caught stealing (coded in reverse), walks, strike outs, batting average (multiplied by 1000), on-base percentage (multiplied by 1000), slugging percentage (multiplied by 1000), double plays hit into, times hit by pitches and sacrifice hits.

These independent variables selected were standard batting statistics, as defined by Major League Baseball.  Advanced statistics were not used  because they are not as commonly cited among mainstream media.  With fan vote playing heavily into selection, and with most fans gaining their baseball knowledge from mainstream media, standard batting statistics were judged to be the most accurate measure for this study.  Statistics were gathered from Baseball Reference, exported into CSV format, and filtered using Microsoft Excel.

After preparing the data, a binary logistic regression was run to see which independent variables were statistically significant in predicting an All-Star Game appearance.  The model found the variables of games played, times caught stealing, strikeouts, on-base percentage and slugging percentage significant at the 90% confidence level.  These variables were 91.1% correct in predicting All-Star appearances.

2014 All-Star Game MVP Mike Trout.

2014 All-Star Game MVP Mike Trout.

Heading into this study, I hypothesized that some of the factors that would be significant would include home runs, runs batted in, batting average and stolen bases.  Although none of these were found to be statistically significant, some of their absences are potentially explained by several of the factors that did make the cut.  Slugging percentage measures total bases divided by at bats, which would lead to a higher overall percentage for players who hit a large number of home runs.  Additionally, although batting average is a traditionally important measure, on-base percentage is a more accurate measure of how often players get on base, by including walks into the equation.

Putting statistical analysis to the side, why were these variables statistically significant?  Taking the factors one by one, there are several possible explanations.  Players that are able to stay healthy for a full season have more opportunities to expose themselves to fans, which explains games played being a statistically significant variable.  Although not a perfect relationship, typically players that are caught stealing more tend to attempt more steals.  Players that attempt more steals tend to be allowed more opportunities due to their ability to successfully steal bases.  Strikeouts (which have a negative coefficient, meaning that more strikeouts have a negative effect on the chance to make the All-Star Game) have an obvious inverse effect on a team’s chance to win, representing a blatantly obvious out.  Due to the visibility of the strikeout, fans are likely to be negatively influenced by players that strike out exorbitant amounts.  The appearance of on-base percentage as a statistically significant factor may point to a shift towards a more advanced analytical mindset from fans.  In the past, on-base percentage was often minimalized in favor of batting average, which does not take into account walks.  On-base percentage considers the traditional batting average (initially hypothesized by the author to be statistically significant prior to the study), and takes into account the walk, which affords a base runner, and therefore an additional opportunity at a run.  Finally, as mentioned earlier, slugging percentage could explain the absence of home runs as a statistically significant factor.  Slugging percentage also takes into account doubles and triples, which also increase expected runs for a team.

Baseball players can take away several lessons from this study.  According to the data, players looking to make All-Star Games, and therefore maximize their potential earnings, have a few areas of the game to focus on in training.  The first is endurance training, to help increase their total games played.  Additionally, they can work on speed, to help avoid getting caught stealing.  When they step up to bat, it is important to have a patient approach, to avoid strikeouts while maximizing their on-base percentage by not only hitting well, but by drawing more walks.  Finally, players should work on strength training to hit more home runs and achieve a higher slugging percentage.

Although this study observes a statistically significant sample, it should be warned that there is the potential for outliers in the 2014 season to affect a general fit on other seasons.  Future models will look at not just a single season, but multiple seasons to account for the potential of outliers.  Although a model that includes all seasons since 1933 (excluding 1945) would not be as accurate due to the change in fan attitudes and on-field strategy, looking at the past ten seasons would serve as a potentially more accurate model than the one built in this initial study.

Table 1

Statistically Significant Factors in Determining MLB All-Star Game Appearances Based on 2014 Data

Independent Variable Beta Coefficient Significance
G .060 .00
CS .096 .10
SO -.012 .10
OBP .971 .06
SLG .976 .05
Constant -22.249 .00

Living History: A Conversation with Ozzie Virgil

History was quietly made on September 23, 1956, when Bill Rigney submitted his lineup card for the New York Giants’ Sunday afternoon home matchup with the Philadelphia Phillies.  Slotted in at third base, in his major league debut, was 24 year old Ozzie Virgil.  And although Virgil wouldn’t record his first major league hit until his second game a week later, he had already made his mark by becoming the first Dominican born player to reach the Major Leagues.

Anthony Simonetti meets with Ozzie Virgil in Boca Chica, Dominican Republic.

Anthony Simonetti meets with Ozzie Virgil in Boca Chica, Dominican Republic.

Osvaldo Virgil was born on May 17, 1932 in Monte Cristi, Dominican Republic.  At the age of 13, his family would immigrate to the United States and settle in the Bronx.  Growing up in the shadow of Yankee Stadium, Virgil attended DeWitt Clinton High School, graduating in 1950.  Following high school, Virgil joined the US Marine Corps Reserves, and was called up in 1950.  In 1952, after getting out, Virgil was signed by the New York Giants following a tryout.

Virgil would work his way up through the Giants’ minor league system, spending time in St. Cloud, Minnesota; Danville, North Carolina; Dallas, Texas; before spending the 1956 season with Minneapolis, where he hit .278 with 10 home runs.  This play would earn him that late season September call up, and despite his debut being a historical landmark, Virgil arrived to almost no fanfare.

In a recent conversation at the New York Mets’ Dominican Academy in Boca Chica, Virgil recalled his path to the majors, as well as several other anecdotes about his time playing baseball.  When asked who the best player he faced or played with was, Virgil was torn between Hall of Famers Willie Mays and Sandy Koufax.  “Both of them had more than just talent, they always wanted to win.  That’s all those guys were focused on back then, winning.”

In nine seasons and 324 games as a professional, Virgil spent time with the Giants (both in New York and San Francisco), Kansas City Athletics, Baltimore Orioles, Pittsburgh Pirates, and was the first non-white player for the Detroit Tigers.  He would slash .231/.263/.331 with 14 home runs, 73 RBIs, 174 hits and 75 runs.  And though Virgil may have produced an otherwise undistinguished -0.5 rWAR, his contribution to the game will never be forgotten.

“I may not have been the most talented, and I may not hold the records or any huge numbers, but I’ll always have a special number: number one!  And I’m glad that I was able to be that person that opened the door for many other Dominicans after me, especially considering there are many others more talented than me.

Since Virgil took the field that fateful Sunday afternoon, 628 other players born in the Dominican have played in the Major Leagues, including Hall of Famers Juan Marichal and Pedro Martinez, and notable All-Stars such as Albert Pujols, Vladimir Guerrero, Sammy Sosa, Manny Ramirez, David Ortiz, Alfonso Soriano, George Bell and many others.  Now one of the top producers of Major League talent outside the United States, Virgil’s trail blazing path was much different from modern day Dominicans.

Despite debuting almost a decade after Jackie Robinson broke the color barrier in 1947 with the Brooklyn Dodgers, Virgil spoke about the struggles of being a non-white player in the 1950s and 60s.  “One of the hardest parts was that we weren’t accepted within the black community, the African-American community.  It was hard being ignored by both the white people and the African-Americans, who didn’t always consider us Latinos as black.  We had to stick together.”

Speaking with Virgil is truly an opportunity to interact with living history, as one of the most influential, and perhaps most forgotten, players in the history of Major League Baseball.  At 83, Virgil still has the spark and energy of a young man, and our afternoon spent with him was truly a powerful moment.  Seeing what Dominican players have today, and hearing from Virgil what players had in the past, gives a true hope that the Dominican game, and the Dominican community, will continue to grow as the years pass.