Wichita State Pitching Can't Strand Baserunners

Monday, March 28, 2016

Quick Blog:

Wichita State pitchers are struggling. They lost All-American and potential first round draft pick Sam Tewes to injury and the staff has felt that load. As far as I can tell the staff - minus Tewes - are regressing to the numbers they put up last year. Their BABIP has come back down under .350 (It was .342 last year) and their K% and BB% are now identical to what they were last year. The K% is 19.6% and the BB% is at 11.3% which is honestly not terrible. For comparison, last year their K% and BB% was 19.7% and 11.7% respectively.

The line drive percentage (LD%) is also nearly what it was last year. LD% gives us an estimate of how many hard hit balls the pitching staff is giving up per at-bat. It is not exact, but it is a close approximate. The LD% for this year is 38% and last season it was 37%. So it is within the margin or error.

With luck becoming less and less of a factor we see regression to the mean in full force. That being said the Fielding Independent Pitching (FIP) stat is actually considerably lower this year for the entire staff - 4.10 this year compared to 4.55 last year. The same can be said for ERA - 5.10 this year and 6.10 last year. What gives?

The answer lies with runners on base starting with the percentage of runs scored per batters faced. This ratio is high to say the least. Last season the Shockers ended with a 14.4% Runs/Per Batter Faced metric. This year that number has ballooned to 17.2%. They are allowing 17 of every 100 opposing batters to score!! The main culprit of this increase is how the pitchers have performed with runners on base.

Left-on-base percentage is a stat that tells us how well pitchers are stranding runners. The higher the number the better they are at doing just this. Unfortunately, WSU is not doing well in this department. Check out the table below comparing Wichita State pitching LOB% over the last three years including 2016.


Wichita state cannot keep opposing runners from scoring once they reach base. Overall the pitching staff is doing poorly in this category and it doesn't get any better if you look at individual players. Willie Schwanke's LOB% is down more than 10% from last year. That is insane! McGinnis, Jones, and Heuer numbers for this metric are sub 50% and they have a good deal of innings pitched and batters faced.

A team cannot compete if they cannot find some way to increase this ratio. A place to start would be looking at bringing in high strikeout guys with runners in scoring position especially later in the game. Strikeout percentage is the most important stat to look at when analyzing relief pitchers. The Shockers have some guys that might be able to do this, but some are being used as starters - and rightly so. However, there are some younger freshman that have a K% - albeit with limited batters faced. At this point though all resources should be considered.

Wichita State Baseball: Comparison of Each Batter's Contribution This Season

Tuesday, March 15, 2016

I finished the first iteration of parsing the NCAA D1 baseball play-by-play data today. Now I want to take a look at some advanced metrics that can be used from this information. The first one I am looking at is the run contribution each batter has made to the team this season (2016).

Let me explain briefly what I mean by run contribution and how it fits into evaluating baseball players. For any given plate appearance there is a potential to score a run. We also know not every batting play is equal. An obvious one is that a home run is more valuable than a hit with no runners on. However a hit with the bases loaded could be more valuable than a home run. The play-by-play data allows us to create a run expectancy matrix for all of the possible runners on base states which is 8 and the possible number of outs which is 3 (0,1, and 2). This matrix showed me the run potential for each of those possible scenarios.

Note: This matrix is not 100 percent accurate. It is based on the my first attempt to parse the play-by-play data and seems to be slightly underestimating each state.

For example with a runner on first and no outs the average runs scored by all NCAA D1 teams for the rest of the inning where this is the scenario is - by my calculation - +.86 or about +1. If a player gets a lead-off single, that team should score on average around one run in that inning. Why does this matter? If we know the run expectancy of each state we can determine the run contribution of each batter based on how they performed in each one of those states. Knowing the run contribution of each player allows us to better evaluate the players. It let's us understand how well each batter performed in every scenario from no runners on and no outs to bases loaded with two outs. Total all of those scenarios together and you can see the total contribution of that player. Runs Created is not RBI's or Hits it is a new value generated by estimating the run contribution of each batting play of the season.

Without further ado....

Run Contribution Chart For Players With More Than 10 AB's

No surprise Troutwine tops the list with nearly 6 Runs Created. To put things into perspective. Corey Ray from Louisville University's - one of the best players in college baseball - has a Runs Created stat of around 7. The players who have negative values should not be discouraged because it is early in the season. A solid player will end with a Runs Created (RC) around 15 based on the averages I calculated from the play-by-play data from the 2015 season.

As you can see from the chart, a high or low OPS (OBPct + SLGPct) does not necessarily mean that player is contributing more or less than another player. For example, a player who hit ten triples in ten at-bats with the bases empty would have a very high OPS and a decent Runs Created value. On the other hand, a player who hit ten doubles in ten at-bats with the bases loaded would have a lower OPS than the previous player, but a higher Runs Created. Obviously this is an extreme hypothetical, but I was trying to make a point. Runs Created goes beyond the triple slash to paint a more accurate picture of the contribution the team is getting from each batter.

As I mentioned above, the RC stat can be used to see how well batters performed in each type of situation such as with the bases loaded and so on. Below are visualizations of this exact measure for all of the players listed in the chart above. Notice the line at the zero mark in each chart. Points above the line represent positive run values contributed for each base state. Points below represent negative run values such as when the batter makes an out or performs some other type of low level outcome like being hit by the pitch or sac bunt. The more points above the line the better. Please note you may not be able to see all the points as some overlap each other. This can make a few of the charts looked skewed.

In the next post I will use this research along with other baseball statistics and research to create an optimized lineup for Wichita State. Preliminary analysis looks like Shocker baseball could be slotting guys in the order differently to achieve better results over the course of the season. As it stands guys are being placed in the batting order where it is not maximizing their individual benefits.











Is Wichita State Baseball Suffering From Bad Luck?

Tuesday, March 8, 2016

Wichita State baseball is eleven games into their 2016 season and things have not started off well. It is very early and WSU can still turn it around, but that process needs to start now. The schedule from here on out is going to get tough to say the least including one-offs with rivals Oklahoma and Oklahoma State as well as a three game series with Cal. State Fullerton and Nebraska.

By this point we have a decent idea as to what type of performance the team is receiving from the everyday starters and even some of the most-used replacements. I scraped play-by-play data as well as team statistics from the NCAA website in order to take a closer look as to what has gone wrong thus far and to see if there is anything that can be done to correct this trend.

A good starting point is to look at the team's runs scored and runs allowed. We can use these numbers to find the teams run differential. This metric is important because it a fairly solid predictor of a teams' win-loss record. The thought process is if a team scores lots of runs and allows very few runs they will win a majority of their games. As I am writing this - through the first eleven games - Wichita State has scored 78 runs and allowed 77 runs. Intuitively one would come to the conclusion that WSU should be a .500 team and that would be correct. Bill James - famed sabermetrician and a consultant to the Boston Red Sox - came up with a neat way of using a teams run scored and runs allowed to predict a teams winning percentage and the math is usually relatively accurate. His formula is very similar to the Pythagorean theorem and looks like this:

Winning percentage = Runs Scored ^ 2 / (Runs Scored ^ 2 + Runs Allowed ^ 2)

Squaring the variables provides a close estimate for Major League Baseball but to get a more accurate figure the exponent needs to be calculated for the college run environment. I will spare you the gory math and just say that I did this using the the NCAA play-by-play data and came up with an exponent of 1.8. Using this exponent in Bill James' formula I came up with a predicted win total - for WSU - of 5 to 6 games for this season through 11 games.

There is an easy explanation for why WSU has under-performed their Pythagorean estimate. The run differential of games they won is much higher than the ones they lost. Their four wins came by an average margin of 7.25 runs. Of the games they lost the average run differential was 3.5. So what does this mean? Basically this means that WSU can score runs, but they also allow a ton of runs. They need to find a way to keep games closer. The 3.5 run differential in their losses is not great, but it is better than last year. In 2015, the team had a 4.2 run differential in their losses. Much like this year, last year the team under-performed their predictive win total. They scored 342 runs and allowed 337 but came away with a 44% winning percentage when our formula had them pegged at 51%. This year they are on pace to score more runs than last year and last year they scored a lot. To put things in perspective, in the 2012 season the team had a record of 35-25 with an average runs per game of 5.4. Last year their runs per games stat was 5.7 - higher than there winning season in 2012 - and they had a losing record. This season they are scoring runs at a clip of 7.1 runs per game yet they continue to struggle to win.

Logically one might come to the conclusion that pitching is the culprit and that reasoning would be partially correct. There is no doubt that the pitching and defensive side of the game has failed WSU thus far, but is it the pitchers fault or has it been partly due to bad luck?

Take a look at the chart below showing the batting average on balls in play (BABIP) for this year and last year. The chart shows the stats for the pitchers who have the most batters faced this season and who we had stats for from last season.

* Sam Tewes was injured most of 2015 so 2014 stats were also used

For those of you unfamiliar with BABIP, it is a stat that gives us an idea as to how batters have performed against pitchers once they put the ball in play. In a sense, it is isolating only the batted balls. Every pitcher has an average BABIP that they hover around year in year out. If the pitcher has a higher than normal BABIP it could mean they are due for a regression and could start to see an increase in performance. Those experiencing a lower than normal BABIP could be candidates for a decline in performance as they are likely to see more balls in play go for hits against them.

By looking at the chart above we can see that the Shockers have seen an abnormal amount of balls in play go for hits. Is this because they are just giving up harder hit balls or are opponents simply finding holes.

To answer this question we need to look at batted ball metrics and I will use these same four pitchers as an example. Remember these players represent the majority of batters faced for the pitching staff. The next chart compares the batted ball data (ground ball %, fly ball %, and line drive %) from this season to last.

Not good all the way around. It would seem that the high BABIP's are in fact due to an increase in the number of hard hit balls - aka line drives. I will admit this batted ball data is not perfect, but I did pull the information from the NCAA website so I think it gives us reasonable estimate.

I want to take a look at one more stat to see if we can't more accurately pinpoint the problem. This final table shows you the left-on-base percentage (LOB%) for our four pitchers as well as the team from 2016 and 2015.

This could be the problem. Three of the four pitchers have seen a significant decrease in this percentage. This means that they are leaving less runners stranded on base. The hits are coming with runners on base and this spells problems for any pitcher. In the Shocker's case, three of their four most relied on pitchers are allowing poorly timed hits all at the same time of the season. It is no wonder why they are struggling. Can this be corrected? Most definitely! In fact, it has been researched and proven by many sabermetricians that LOB% regresses to the mean more often than not. For are four pitchers this means good times are coming. Let's just hope that when these times come the offense is still firing on all cylinders so that the increase in performance can translate into wins.

Is it Possible to Sustain a Competitive Advantage in Baseball?

Monday, January 11, 2016

During this off-season I have been thinking a lot about why it appears major league teams can't perpetuate success once they have experienced it. By success, I mean making it to the playoffs. There are teams that do it, but it is rare. Why can't teams seem to make it to the playoffs multiple years in a row? Some of it has to do with the size of the team's payroll, but there are many playoff streak instances in history where this was not the case.

The chart below displays the longest playoff streaks in baseball history post 1968. Pleases note, I chose to only look at seasons after 1968 because it wasn't until 1969 when the modern playoff system was implemented. During that time, 17 different franchises made the playoffs at least 3 seasons in a row - 28 total streaks of 3 playoff appearances or more considering some franchises, like the Yankees, enjoyed multiple playoff runs. There are only nine playoff streaks of four or more seasons in a row.

The point of this chart is to show you the outliers. From 1969 on, there were always at least 4 teams in the playoffs and from 1981 on there were always at least 8 total. This means the vast majority of teams who make the playoffs do not go on to preserve prosperity. They can't string multiple playoff appearances together. Why is this the case?

The short answer is, over the years a majority of teams - with the exception of a few - have not been able to deploy methods and approaches that cannot be duplicated by other teams. For instance, the Oakland A's were one of the first teams to begin using sabermetrics to help them compete against teams who in the past had essentially been able buy their way to the top. The analytic scheme worked, but over the years these same statistical principles began to be used by many teams. Now, even teams that the stat world considers to be anti-saber have at least a small quantitative department.

In my last post I briefly mentioned that "moneyball" tactics could be considered operational effectiveness and that success brought on by using these tactics are unsustainable. I borrowed the term operational effectiveness from Michael Porter, famed strategist and Harvard professor. He used the term when referring to certain management tools - such as six sigma or total quality managment - used by companies in their quest for productivity. Porter contends that operational effectiveness is necessary, but not sufficient when it comes to creating a competitive advantage. Porter is saying that companies cannot rely on using six sigma or TQM as a way to sustain an advantage over the competition. The techniques are imitable by other companies and therefore any advantage that they bring will only last a short time. I am saying that baseball teams cannot rely on using sabermetrics as a way to sustain an advantage over the competition. All teams will begin to look the same. However, it doesn't take the whole league using advanced analytics for the advantages of using them to wash away. The MLB playoff bracket has ten open spots for teams to fill. Even 11 or 12 teams using sabermetrics - evenly distributed between the NL and AL - cause the previous competitive advantage of using these advanced stats to disappear. The sustainability is lost.

Looking at the chart above, it would seem that sustaining a competitive advantage in baseball is possible, but very hard to do. Some teams - while their intentions are valid - are not going about implementing strategies that bring about sustainable advantages. My goal is to create a conceptual framework for teams looking for continuous success. I am going to borrow ideas from Jay Barney, business professor at the University of Utah, to help me explain the framework.

I have been throwing around the phrase competitive advantage and while I am sure most of you have a good idea as to what it means I will define it according to Barney's view. A competitive advantage is when a business is implementing a value creating strategy not simultaneously being implemented by a competitor. In baseball this can mean nearly the same thing. For example, a team using sabermetrics to help them value players can and is being done by more than one team. That being said, there may be other resources being used by one team that is not being used by all of the others thus giving that team the possibility of creating a competitive advantage.

Generating a competitive advantage is great, but we are seeking sustainability. In order for a competitive advantage to be sustainable, firms - or in our case baseball teams - must be unable to replicate. An example would be the Royals bullpen strategy. Using three different closer-type relievers for innings seven, eight, and nine can and is now being duplicated by other teams. The strategy alone is not sustainable. What is sustainable is the resources used - Herrera, Davis, and Holland. These are three of the best relievers in the game. In any given year there are three relievers that are statistically the very best. Conceivably, if a team acquired those three relievers then they would have the necessary resources for building a sustained advantage. It is the resources that I contend are the key to a team being able to prolong success.

According to Barney, resources are all assets, capabilities, organizational processes, firm attributes, information, and knowledge controlled by a firm that enable it to conceive and implement strategies that improve its efficiency and effectiveness. These resources while numerous can be classified into three main categories: physical capital resources, human capital resources, and organizational capital resources. Listed below are these three resource categories with baseball examples.

1. Physical capital resources
  • Technology used within the baseball operations department
  • Geographic location of major league team, minor league teams, and academies.
  • Access to prospect centers (ie. states, regions, countries player pools)
2. Human capital resources
  • Training of players, managers, and front office personnel
  • Experience of players, managers, and front office personnel
  • Judgement of players, managers, and front office personnel
3. Organizational capital resources
  • Teams' formal reporting structure
  • Teams' formal and informal planning (strategy)
  • Teams informal relations among groups within and between other teams
These three main resources, and the many that fall under their heading, will always give teams the potential to create a competitive advantage if they contain four crucial attributes. Before I get into those attributes I need to mention another major qualifier for the the resource-based framework to be applied properly. For a team to be able to sustain a competitive advantage and in turn string together multiple playoff runs, the baseball industry must be heterogeneous. If all teams contain the same resources then logically no one could create an advantage based on this model. Luckily, baseball teams most certainly contain unique resources and I do not see this changing anytime soon. Ok, back to the four attributes that a resource must hold for it to have the potential to generate a sustainable advantage for the team.

First, the resource must be valuable, in the sense it exploits opportunities and neutralizes threats to the team's environment. Take the resource of human capital. Using this logic a team could hire very experienced front office personnel and take advantage of their wisdom in assembling a team. At the same time, if the front office is truly a cut above the rest, then their proprietary knowledge would also neutralize any threat from competing teams. Other teams could not hire employees that matched the front office's level of experience and judgement. Another example comes with the resource(s) of organizational capital. For instance - and this is completely hypothetical - a team could set up a network of academies and scouts in Russia and establish relationships that would give them a first mover advantage. This would exploit opportunities and neutralize threats if along with this resource the team were to hire or train scouts that could build unique relationships with prospects, their coaches, and managers.

Second, the resource must be imperfectly imitable. To be imperfectly imitable the resource must meet three criteria. One - the ability of the team to obtain the resource is dependent upon unique historical conditions. Think about how the collective bargaining agreement has changed over the years. At one time teams had a lot more power than the players. Let's say a resource (player) was obtained by an organization during a period when teams had more power than the players thus making the contract more advantageous for the team. Let's also say this player was a superstar and that he gave his team a great advantage on the field. A few years later a new agreement comes out giving more power to the players, but the superstar has already signed a contract with the team before that new agreement. In this case, history was on the side of the team who acquired the superstar. Other teams would not have the same advantage after the agreement was changed. They could not sign a similar caliber player for the same inflation adjusted rate as the team who signed the superstar during the time of the old agreement.

Two - the link between the resource possessed and the team's sustained advantage is casually ambiguous. This would exist when the resource controlled is not understood or only slightly understood. In Lehman's terms this is the "it" factor that no one can put their finger on. This may be accomplished intentionally by a team or it may come about by accident. An example would be the Cardinals recent success. We know they have had some good players, but there also seems to be something going on behind the scenes - unavailable to the public - that has led to their continued winning ways. Maybe the entire strategy or process of the Cardinals is so complex that others can't rap their head around it. The key here is that other teams cannot replicate their strategy because they can't understand how they have done it. In my opinion, this sub-attribute crucial to baseball team's sustaining success.

Three - the resource generating the team's advantage is socially complex. For example, the ball club may have superior interpersonal relationships between managers and front office staff. Also, the club may have a better reputation with prospect regions meaning one team is looked at as a better organization to play for over the others. Maybe all of the prospects in Panama see the Red Sox as the best team to play for and would be willing to make trade-offs in order to sign with them. Another example could have to do with the technology used by front offices. Teams may possess the same technology, but only one may be able to use it's complex social resources (scouts, managers, etc.) to harness it. Think about the problems with Dipoto and Scioscia in Anaheim. The Angels were clearly not able to successfully utilize the technology throughout the entire organization.

Third, for a resource to have the potential to sustain a competitive advantage for a team it must be rare among current and potential competition. An obvious example of this are drafted players that turn out to be superstars. These types of players are scarce and shows you just how important that drafting talent is to the sustainability of organizational success. However the notion of substitutes comes into play here and brings us to the final attribute a resource must possess to have the potential to help a team sustain a competitive advantage.

Forth, there cannot be strategically equivalent substitutes to the resource. If other teams can create close substitutes of having a superstar then the resource (superstar) will not be able to sustain an advantage for his team. This doesn't have to be limited to just one player either. If a team has three or four superstars on their team, but another team is able to use substitutes to match the capabilities then those superstars would not be a source of sustainability. They may lead to short term success, but if another team can duplicate that success by manipulating a different set of resources than the superstars cannot create sustained success. This may be confusing so I will use a very easy example. In 2002, the Oakland A's - using advanced statistics - were able to outperform teams who possessed considerably better individual talent aka superstars. In that year, it was proven that the Yankees superstar players could be substituted for an optimized roster of average to slightly above average players. Around the time the Angels fired Dipoto, Scioscia may have thought that his close knit highly experience management team was a close substitute for the use of Dipoto's analytics department. In some respects he might have been right. I am by no means saying that a team should not use sabermetrics, but that there could be a substitute - that is very different - that produces similar results. In my opinion, a baseball organization with a close knit highly experienced management team that also embraces and uses a unique and intelligent analytics department would have an advantage over a team like what Mike Scioscia presumably envisioned.

If team's truly hold the ability to create a formula that can sustain winning, it is my belief that the key lies within the resources of the entire organization. How team's uses their resources is just as important. I laid out four attributes that a resource must hold in order for it to have the ability to create a sustainable advantage. As you can see - and I assume already suspected - this is very hard to produce. In fact, according to the chart it has only been done a few times in history and - depending on how you are judging - the Yankees are really the only ones to do this frequently throughout history. One might say this is unfair because the Yankees typically have the higher payroll. This is true, but money is very much a resource and a team who controls that particular resource is able to deploy additional resources unavailable to teams without the said resource. This is a harsh reality for some teams, but it has been proven in the past that in baseball their are close substitutes to having money. If there wasn't then the Yankees and Red Sox would win every year.

I hope everyone understands what I was trying to accomplish here. Mostly I wanted to take a high-level approach to constructing a strategy for teams. It may not be enough in the future to simply be good at sabermetrics. If nothing else, it should be clear why it is so difficult for teams to win year in and year out. Correct resources must be decided upon by team's to be used in their overall strategy (ones that contain the four attributes). Those resources must be distributed almost perfectly and then in some cases a good deal of luck is involved. The resource-based framework is not the only form of strategy, but in my mind it could work if applied in baseball.