Jonathan Weisberg
http://jonathanweisberg.org/index.xml
Recent content on Jonathan WeisbergHugo -- gohugo.ioen-usWed, 14 Jun 2017 15:21:42 -0500The Mosteller Hall Puzzle
http://jonathanweisberg.org/post/Teaching%20Monty%20Hall/
Wed, 14 Jun 2017 15:21:42 -0500http://jonathanweisberg.org/post/Teaching%20Monty%20Hall/<p>One of my favourite probability puzzles to teach is a close cousin of the <a href="https://en.wikipedia.org/wiki/Monty_Hall_problem" target="_blank">Monty Hall problem</a>. Originally from a 1965 <a href="https://books.google.ca/books/about/Fifty_Challenging_Problems_in_Probabilit.html?id=QiuqPejnweEC" target="_blank">book by Frederick Mosteller</a>,<sup class="footnote-ref" id="fnref:1"><a rel="footnote" href="#fn:1">1</a></sup> here’s my formulation:</p>
<blockquote>
<p>Three prisoners, A, B, and C, are condemned to die in the morning. But the king decides in the night to pardon one of them. He makes his choice at random and communicates it to the guard, who is sworn to secrecy. She can only tell the prisoners that one of them will be released at dawn.</p>
<p>Prisoner A welcomes the news, as he now has a 1/3 chance of survival. Hoping to go even further, he says to the guard, “I know you can’t tell me whether I am condemned or pardoned. But at least one other prisoner must still be condemned, so can you just name one who is?”. The guard replies (truthfully) that B is still condemned. “Ok”, says A, “then it’s either me or C who was pardoned. So my chance of survival has gone up to ½”.</p>
<p>Unfortunately for A, he is mistaken. But how?</p>
<p><strong>Update</strong>: turns out the puzzle isn’t originally due to Mosteller after all! It appears in <a href="https://www.nature.com/scientificamerican/journal/v201/n4/pdf/scientificamerican1059-174.pdf" target="_blank">a 1959 article</a> in <em>Scientific American</em>, by Martin Gardner.</p>
</blockquote>
<p>For me it’s really intuitive that A is mistaken. The way he figures things, his chance of survival will go up to ½ whoever the guard names in her response. But then A doesn’t even have to bother the guard. He can just skip ahead to the conclusion that his chance of survival is ½. And that’s absurd.</p>
<p>It’s a bit harder to say exactly <em>where</em> A goes wrong. But I’ve always taken this puzzle to be, like Monty Hall, a lesson in Carnap’s TER: the Total Evidence Requirement.</p>
<p>What A learns isn’t only that B is condemned, but also that the guard reports as much. And this report is more likely if C was pardoned than if A was. If C was pardoned, the guard had to name B, the only other prisoner still condemned. Whereas if A was pardoned, the guard could just as easily have named C instead.</p>
<p>So when the guard names B, her report fits twice as well with the hypothesis that C was pardoned, not A:</p>
<p><img src="http://jonathanweisberg.org/img/misc/mosteller_tree_diagram.png" alt="Tree diagram" /></p>
<p>Thus A’s chance of being condemned remains twice that of being pardoned.</p>
<p>If you’re like me, this reasoning will actually be less intuitive than the initial, gut feeling that A must be mistaken (because her logic would make it unnecessary to consult the guard). The argument is still instructive though, for several reasons:</p>
<ol>
<li><p>It shows how the initial, gut feeling is consistent with the probability axioms. We’ve constructed a plausible probability model that vindicates it.</p></li>
<li><p>The Total Evidence Requirement makes the difference in this model. Learning merely that B is condemned would have a different effect in this model. A’s chance of survival really would go up to ½ then.</p></li>
<li><p>These lessons can be carried over to Monty Hall. The same model yields the correct solution there, with the TER playing out in a parallel way.</p></li>
</ol>
<p>And that last point is the real point of this post. As my colleague <a href="http://www.sergiotenenbaum.org/" target="_blank">Sergio Tenenbaum</a> pointed out in conversation, it means you can use Mosteller’s puzzle to teach Monty Hall. Because, unlike in Monty Hall, <em>the intuitive judgment is the correct one in Mosteller’s puzzle</em>. So you can use it to get students on board with the less intuitive (but entirely correct) argument we used to resolve Mosteller’s puzzle.</p>
<p>Once students have seen how important it is to set up the probability model correctly, so that the Total Evidence Requirement can do its work, they may be more comfortable using the same technique on Monty Hall.</p>
<p>There are other ways of bringing students around to the correct solution to Monty Hall, of course. You can run them through a variant with a hundred doors instead of three; you can invite them to consider what would happen in the long run in repeated games; you can ask them how things would have been different had Monty opened the other door instead.</p>
<p>These are all worthy heuristics. And I expect different ones will click for different students.</p>
<p>But for my money, there’s nothing like a simple and concrete model to help me get oriented and shake off that befuddled feeling. And, in this case, Mosteller’s puzzle helps make the model more intuitive, hence more memorable.</p>
<p><img src="https://68.media.tumblr.com/776dfc1f8b3baa0309b41c6a90ea1a13/tumblr_nd53ozBNz81qj0u7fo1_r1_400.gif" alt="Fainting Goat" /></p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">So I think it actually predates Monty Hall, though I gather this general family of puzzles goes back at least to 1889 and <a href="https://en.wikipedia.org/wiki/Bertrand%27s_box_paradox" target="_blank">Bertrand’s box paradox</a>.
<a class="footnote-return" href="#fnref:1"><sup>[return]</sup></a></li>
</ol>
</div>
Accuracy for Dummies, Part 7: Dominance
http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%207%20-%20Brier%20Dominance/
Wed, 07 Jun 2017 00:00:00 -0500http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%207%20-%20Brier%20Dominance/
<p>In our <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 5 - Convexity/">last</a> <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 6 - Obtusity/">two</a> posts we established two key facts:</p>
<ol>
<li>The set of possible probability assignments is convex.</li>
<li>Convex sets are “obtuse”. Given a point outside a convex set, there’s a point inside that forms a right-or-obtuse angle with any third point in the set.</li>
</ol>
<p>Today we’re putting them together to get the central result of the accuracy framework, the Brier dominance theorem. We’ll show that a non-probabilistic credence assignment is always “Brier dominated” by some probabilistic one. That is, there is always a probabilistic assignment that is closer, in terms of Brier distance, to every possible truth-value assignment.</p>
<p>In fact we’ll show something a bit more general. We’ll show that there’s a probability assignment that’s closer to all the possible <em>probability</em> assignments. But truth-value assignments are probability assignments, just extreme ones. So the result we really want follows straight away as a special case.$
\renewcommand{\vec}[1]{\mathbf{#1}}
\newcommand{\x}{\vec{x}}
\newcommand{\y}{\vec{y}}
\newcommand{\z}{\vec{z}}
\newcommand{\v}{\vec{v}}
\newcommand{\p}{\vec{p}}
\newcommand{\q}{\vec{q}}
\newcommand{\B}{B}
\newcommand{\R}{\mathbb{R}}
\newcommand{\EIpq}{EI_{\p}(\q)}\newcommand{\EIpp}{EI_{\p}(\p)}
$</p>
<h1 id="recap">Recap</h1>
<p>For reference, let’s collect our notation, terminology, and previous results, so that we have everything in one place.</p>
<p>We’re using $n$ for the number of possibilities under consideration. And we use bold letters like $\x$ and $\p$ to represent $n$-tuples of real numbers. So $\p = (p_1, \ldots, p_n)$ is a point in $n$-dimensional space: a member of $\R^n$.</p>
<p>We call $\p$ a <em>probability assignment</em> if its coordinates are (a) all nonnegative, and (b) they sum to $1$. And we write $P$ for the set of all probability assignments.</p>
<p>We call $\v$ a <em>truth-value assignment</em> if its coordinates are all zeros except for a single $1$. And we write $V$ for the set of all truth-value assignments.</p>
<p>A point $\y$ is a <em>mixture</em> of the points $\x_1, \ldots, \x_n$ if there are real numbers $\lambda_1, \ldots, \lambda_n$ such that:</p>
<ul>
<li>$\lambda_i \geq 0$ for all $i$,</li>
<li>$\lambda_1 + \ldots + \lambda_n = 1$, and</li>
<li>$\y = \lambda_1 \x_1 + \ldots + \lambda_n \x_n$.</li>
</ul>
<p>We say that a set is <em>convex</em> if it is closed under mixing, i.e. any mixture of elements in the set is also in the set.</p>
<p>The difference between two points, $\x - \y$, is defined coordinate-wise:
$$ \x - \y = (x_1 - y_1, \ldots, x_n - y_n). $$
The <em>dot product</em> of two points $\x$ and $\y$ is written $\x \cdot \y$, and is defined:
$$ \x \cdot \y = x_1 y_1 + \ldots + x_n y_n. $$
As a reminder, the dot product returns a single, real number (not another $n$-dimensional point as one might expect). And the sign of the dot product reflects the angle between $\x$ and $\y$ when viewed as vectors/arrows. In particular, $\x \cdot \y \leq 0$ corresponds to a right-or-obtuse angle.</p>
<p>Finally, $\B(\x,\y)$ is the Brier distance between $\x$ and $\y$, which can be defined:
$$
\begin{align}
\B(\x,\y) &= (\x - \y)^2\\<br />
&= (\x - \y) \cdot (\x - \y).
\end{align}
$$</p>
<p>Now let’s restate the two key theorems we’ll be relying on.</p>
<p><strong>Theorem (Convexity).</strong>
The set of probability functions $P$ is convex.</p>
<p>We established this in <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 5 - Convexity/">Part 5</a> of this series. In particular, we showed that $P$ is the “convex hull” of $V$: the set of all mixtures of points in $V$.</p>
<p><strong>Lemma (Obtusity).</strong>
If $S$ is convex, $\x \not \in S$, and $\y \in S$ minimizes $\B(\y,\x)$ as a function of $\y$ on the domain $S$, then for any $\z \in S$, $(\x - \y) \cdot (\z - \y) \leq 0$.</p>
<p>The intuitive idea behind this lemma, which we proved last time in <a href="(/post/Accuracy for Dummies - Part 6 - Obtusity/)" target="_blank">Part 6</a>, can be illustrated with a diagram:
<img src="http://jonathanweisberg.org/img/accuracy/ObtusityLemma3.png" alt="" />
Given a point outside a convex set, we can find a point inside (the closest point) that forms a right-or-obtuse angle with all other points in the set.</p>
<p>What we’ll show next is the natural and intuitive consequence: that point $\y$ is thus closer to any point $\z$ of $S$ than $\x$ is.</p>
<h1 id="the-brier-dominance-theorem">The Brier Dominance Theorem</h1>
<p>Intuitively, we want to show that if the angle formed at point $\y$ with the points $\x$ and $\z$ is right-or-obtuse, then $\y$ must be closer to $\z$ than $\x$ is (in Brier distance).</p>
<p>Formally, a right-or-obtuse angle corresponds to a dot product less than or equal to zero: $(\x - \y) \cdot (\z - \y) \leq 0$. But if $\x = \y$, then the dot product will be zero trivially. So the precise statement of our theorem is:</p>
<p><strong>Theorem.</strong>
If $(\x - \y) \cdot (\z - \y) \leq 0$ and $\x \neq \y$, then $\B(\x,\z) > \B(\y,\z)$.</p>
<p><em>Proof.</em> To start, we establish a general identity via algebra:
$$
\begin{align}
\B(\x, \z) - \B(\x, \y) - \B(\y,\z)
&= (\x - \z)^2 - (\x - \y)^2 - (\y - \z)^2\\<br />
&= -2\y^2 - 2 \x \cdot \z + 2 \x \cdot \y + 2 \y \cdot \z\\<br />
&= -2 (\x - \y) \cdot (\z - \y).
\end{align}
$$
Now suppose $ (\x - \y) \cdot (\z - \y) \leq 0$. Then, given the negative sign on the $-2$ in the established identity,
$$ \B(\x, \z) - \B(\x, \y) - \B(\y,\z) \geq 0, $$
from which we derive
$$ \B(\x, \z) \geq \B(\x, \y) + \B(\y,\z). $$
Now, since $\x \neq \y$ by hypothesis, $\B(\x,\y) > 0$. Thus $\B(\x,\z) > \B(\y,\z)$, as desired.
<span class="floatright">$\Box$</span></p>
<p>It follows now that if $\x$ isn’t a probability assignment, there’s a probability assignment that’s closer to every truth-value assignment than $\x$ is.</p>
<p><strong>Corollary (Brier Dominance).</strong> If $\x \not \in P$ then there is a $\p \in P$ such that $\B(\p,\v) < \B(\x, \v)$ for all $\v \in V$.</p>
<p><em>Proof.</em> Fix $\x \not \in P$, and let $\p$ be the member of $P$ that minimizes $B(\y,\x)$ as a function of $\y$. The Convexity theorem tells us that $P$ is convex, so the Obtusity lemma implies $(\x - \p) \cdot (\v - \p) \leq 0$ for every $\v \in V$. And since $\x \neq \p$ (because $\x \not \in P$), the last theorem entails $\B(\p,\v) < \B(\x, \v)$, as desired.
<span class="floatright">$\Box$</span></p>
<p>This is the core of the main result we’ve been working towards. Hooray! But, we still have one piece of unfinished business. For what if $\p$ is itself dominated??</p>
<h1 id="undominated-dominance">Undominated Dominance</h1>
<p>We’ve shown that credences which violate the probability axioms are always “accuracy dominated” by some assignment of credences that obeys those axioms. But what if those dominating, probabilistic credences are themselves dominated? <em>What if they’re dominated by non-probabilistic credences??</em></p>
<p>For all we’ve said, that’s a real possibility. And if it actually obtains, then there’s nothing especially accuracy-conducive about the laws of probability. So we had better rule this possibility out. Luckily, that’s pretty easy to do.</p>
<p>In fact, the reals work here was already done back in <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 3/">Part 3</a> of the series. There we showed that Brier distance is a “proper” measure of inaccuracy: each probability assignment expects itself to do best with respect to accuracy, if inaccuracy is measured by Brier distance.</p>
<p>As a reminder, we wrote $\EIpq$ for the expected inaccuracy of probability assignment $\q$ according to assignment $\p$. When inaccuracy is measured in terms of Brier distance:
$$ \EIpq = p_1 \B(\q,\v_1) + p_2 \B(\q,\v_2) + \ldots + p_n \B(\q,\v_n). $$
Here $\v_i$ is the truth-value assignment with a $1$ in the $i$-th coordinate, and $0$ everywhere else. What we showed in Part 3 was:</p>
<p><strong>Theorem.</strong>
$\EIpq$ is uniquely minimized when $\q = \p$.</p>
<p>And notice, this would be impossible if there were some $\q$ such that $\B(\q,\v_i) \leq \B(\p,\v_i)$ for all $i$. For then the weighted average $\EIpq$ would have to be no larger than $\EIpp$. And this contradicts the theorem, which says that $\EIpq > \EIpp$ for all $\q \neq \p$.</p>
<p>So, at long last, we have the full result we want:</p>
<p><strong>Corollary (Undominated Brier Dominance).</strong> If $\x \not \in P$ then there is a $\p \in P$ such that $\B(\p,\v) < \B(\x, \v)$ for all $\v \in V$. Moreover, there is no $\q \in P$ such that $\B(\q,\v) \leq \B(\p, \v)$ for all $\v \in V$.</p>
<p>So the laws of probability really are specially conducive to accuracy, as measured using Brier distance. Only probabilistic credence assignments are undominated.</p>
<h1 id="where-to-next">Where to Next?</h1>
<p>That’s a pretty sweet result. And it raises plenty of fun and interesting questions we could look at next. Here are three:</p>
<ol>
<li><p>What about other ways of measuring inaccuracy besides Brier? Are there reasonable alternatives, and if so, do similar results apply to them?</p></li>
<li><p>What about other probabilistic principles, like Conditionalization, the Principal Principle, or the Principle of Indifference? Can we take this approach beyond the probability axioms?</p></li>
<li><p>Speaking of the probability axioms, we’ve been working with a pretty paired down conception of a “probability assignment”. Usually we assign probabilities not just to atomic possibilities, but to disjunctions/sets of possibilities: e.g. “the prize is behind either door #1 or door #2”. Can we extend this result to such “super-atomic” probability assignments?</p></li>
</ol>
<p>We’ll tackle some or all of these questions in future posts. But I haven’t yet decided which ones or in what order.</p>
<p>So for now let’s just stop and appreciate the work we’ve already done. Because not only have we proved one of the most central and interesting results of the accuracy framework. But also, in a lot of ways the hardest work is already behind us. If you’ve come this far, I think you deserve a nice pat on the back.</p>
<p><img src="http://i1145.photobucket.com/albums/o503/KimmieRocks/tumblr_liqmv89ru51qb2dn6.gif" alt="" /></p>
Journal Submission Rates by Gender: A Look at the APA/BPA Data
http://jonathanweisberg.org/post/A%20Look%20at%20the%20APA-BPA%20Data/
Tue, 06 Jun 2017 11:45:04 -0500http://jonathanweisberg.org/post/A%20Look%20at%20the%20APA-BPA%20Data/
<p><strong>Update:</strong> <em>editors at CJP and Phil Quarterly have kindly shared some important, additional information. See the edit below for details.</em></p>
<p>A <a href="https://link.springer.com/article/10.1007/s11098-017-0919-0" target="_blank">new paper</a> on the representation of women in philosophy journals prompted some debate in the philosophy blogosphere last week. The paper found women to be underrepresented across a range of prominent journals, yet overrepresented in the two journals studied where review was non-anonymous.</p>
<p>Commenters <a href="http://dailynous.com/2017/05/26/women-philosophy-journals-new-data/" target="_blank">over at Daily Nous</a> complained about the lack of base-rate data. How many of the submissions to these journals were from women? In some respects, it’s hard to know what to make of these findings without such data.</p>
<p>A few commenters linked to <a href="http://www.apaonline.org/resource/resmgr/journal_surveys_2014/apa_bpa_survey_data_2014.xlsx" target="_blank">a survey</a> conducted by the APA and BPA a while back, which supplies some numbers along these lines. I was surprised, because I’ve wondered about these numbers, but I didn’t recall seeing this data-set before. I was excited too because the data-set is huge, in a way: it covers more than 30,000 submissions at 40+ journals over a span of three years!</p>
<p>So I was keen to give it a closer look. This post walks through that process. But I should warn you up front that the result is kinda disappointing.</p>
<h1 id="initial-reservations">Initial Reservations</h1>
<p>Right away some conspicuous omissions stand out.<sup class="footnote-ref" id="fnref:1"><a rel="footnote" href="#fn:1">1</a></sup> A good number of the usual suspects aren’t included, like <em>Philosophical Studies</em>, <em>Analysis</em>, and <em>Australasian Journal of Philosophy</em>. So the usual worries about response rates and selection bias apply.</p>
<p>The data are also a bit haphazard and incomplete. Fewer than half of the journals that responded included gender data. And some of those numbers are suspiciously round.</p>
<p>Still, there’s hope. We have data on over ten thousand submissions even after we exclude journals that didn’t submit any gender data. As long as they paint a reasonably consistent picture, we stand to learn a lot.</p>
<h1 id="first-pass">First Pass</h1>
<p>For starters we’ll just do some minimal cleaning. We’ll exclude data from 2014, since almost no journals supplied it. And we’ll lump together the submissions from the remaining three years, 2011–13, since the gender data isn’t broken down by year.</p>
<p>We can then calculate the following cross-journal tallies for 2011–13:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Accepted submissions</th>
<th align="right">Rejected submissions</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Men</td>
<td align="right">792</td>
<td align="right">9104</td>
</tr>
<tr>
<td align="left">Women</td>
<td align="right">213</td>
<td align="right">1893</td>
</tr>
</tbody>
</table>
<p>The difference here looks notable at first: 17.5% of submitted papers came from women compared with 21.2% of accepted papers, a statistically significant difference (<em>p</em> = 0.002).</p>
<p>But if we plot the data by journal, the picture becomes much less clear:</p>
<p><img src="http://jonathanweisberg.org/img/apa_bpa_data_files/unnamed-chunk-3-1.png" alt="" /><!-- --></p>
<p>The dashed line<sup class="footnote-ref" id="fnref:2"><a rel="footnote" href="#fn:2">2</a></sup> indicates parity: where submission and acceptance rate would be equal. At journals above the line, women make up a larger portion of published authors than they do submitting authors. At journals below the line, it’s the reverse.</p>
<p>It’s pretty striking how much variation there is between journals. For example, <em>BJPS</em> is 12 points above the parity line while <em>Phil Quarterly</em> is 9 points below it.</p>
<p>It’s also notable that it’s the largest journals which diverge the most from parity: <em>BJPS</em>, <em>EJP</em>, <em>MIND</em>, and <em>Phil Quarterly</em>. (Note: <em>Hume Studies</em> is actually the most extreme by far. But I’ve excluded it from the plot because it’s very small, and as an extreme outlier it badly skews the <em>y</em>-axis.)</p>
<p>It’s hard to see all the details in the plot, so here’s the same data in a table.</p>
<table>
<thead>
<tr>
<th align="left">Journal</th>
<th align="right">submissions</th>
<th align="right">accepted</th>
<th align="left">% submissions women</th>
<th align="left">% accepted women</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Ancient Philosophy</td>
<td align="right">346</td>
<td align="right">63</td>
<td align="left">20</td>
<td align="left">24</td>
</tr>
<tr>
<td align="left">British Journal for the Philosophy of Science</td>
<td align="right">1267</td>
<td align="right">117</td>
<td align="left">15</td>
<td align="left">27</td>
</tr>
<tr>
<td align="left">Canadian Journal of Philosophy</td>
<td align="right">792</td>
<td align="right">132</td>
<td align="left">20</td>
<td align="left">21</td>
</tr>
<tr>
<td align="left">Dialectica</td>
<td align="right">826</td>
<td align="right">74</td>
<td align="left">12.05</td>
<td align="left">15.48</td>
</tr>
<tr>
<td align="left">European Journal for Philosophy</td>
<td align="right">1554</td>
<td align="right">98</td>
<td align="left">11.84</td>
<td align="left">25</td>
</tr>
<tr>
<td align="left">Hume Studies</td>
<td align="right">152</td>
<td align="right">30</td>
<td align="left">23.7</td>
<td align="left">58.1</td>
</tr>
<tr>
<td align="left">Journal of Applied Philosophy</td>
<td align="right">510</td>
<td align="right">47</td>
<td align="left">20</td>
<td align="left">20</td>
</tr>
<tr>
<td align="left">Journal of Political Philosophy</td>
<td align="right">1143</td>
<td align="right">53</td>
<td align="left">35</td>
<td align="left">30</td>
</tr>
<tr>
<td align="left">MIND</td>
<td align="right">1498</td>
<td align="right">74</td>
<td align="left">10</td>
<td align="left">5</td>
</tr>
<tr>
<td align="left">Oxford Studies in Ancient Philosophy</td>
<td align="right">290</td>
<td align="right">43</td>
<td align="left">21</td>
<td align="left">20.3</td>
</tr>
<tr>
<td align="left">Philosophy East and West</td>
<td align="right">320</td>
<td align="right">66</td>
<td align="left">20</td>
<td align="left">15</td>
</tr>
<tr>
<td align="left">Phronesis</td>
<td align="right">388</td>
<td align="right">38</td>
<td align="left">24</td>
<td align="left">25</td>
</tr>
<tr>
<td align="left">The Journal of Aesthetics and Art Criticism</td>
<td align="right">611</td>
<td align="right">93</td>
<td align="left">29</td>
<td align="left">27</td>
</tr>
<tr>
<td align="left">The Philosophical Quarterly</td>
<td align="right">2305</td>
<td align="right">77</td>
<td align="left">14</td>
<td align="left">5</td>
</tr>
</tbody>
</table>
<h1 id="rounders-removed">Rounders Removed</h1>
<p>I mentioned that some of the numbers look suspiciously round. Maybe 10% of submissions to <em>MIND</em> really were from women, compared with 5% of accepted papers. But some of these cases probably involve non-trivial rounding, maybe even eyeballing or guesstimating. So let’s see how things look without them.</p>
<p>If we omit journals where both percentages are round (integer multiples of 5), that leaves ten journals. And the gap from before is even more pronounced: 16.3% of submissions from women compared with 22.9% of accepted papers (<em>p</em> = 0.0000003).</p>
<p>But it’s still a few, high-volume journals driving the result: <em>BJPS</em> and <em>EJP</em> do a ton of business, and each has a large gap. So much so that they’re able to overcome the opposite contribution of <em>Phil Quarterly</em> (which does a mind-boggling amount of business!).</p>
<h1 id="editors-anonymous">Editors Anonymous</h1>
<p>Naturally I fell to wondering how these big journals differ in their editorial practices. What are they doing differently that leads to such divergent results?</p>
<p>One thing the data tell us is which journals practice fully anonymous review, with even the editors ignorant of the author’s identity. That narrows it down to just three journals: <em>CJP</em>, <em>Dialectica</em>, and <em>Phil Quarterly</em>.<sup class="footnote-ref" id="fnref:3"><a rel="footnote" href="#fn:3">3</a></sup> The tallies then are:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Accepted submissions</th>
<th align="right">Rejected submissions</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Men</td>
<td align="right">240</td>
<td align="right">3103</td>
</tr>
<tr>
<td align="left">Women</td>
<td align="right">43</td>
<td align="right">537</td>
</tr>
</tbody>
</table>
<p>And now the gap is gone: 14.8% of submissions from women, compared with 15.2% of accepted papers—not a statistically significant difference (<em>p</em> = 0.91). That makes it look like the gap is down to editors’ decisions being influenced by knowledge of the author’s gender (whether deliberately or unconsciously).</p>
<p>But notice again, <em>Phil Quarterly</em> is still a huge part of this story. It’s their high volume and unusually negative differential that compensates for the more modest, positive differentials at <em>CJP</em> and <em>Dialectica</em>. So I still want to know more about <em>Phil Quarterly</em>, and what might explain their unusually negative differential.</p>
<p><strong>Edit</strong>: editors at <em>CJP</em> and <em>Phil Quarterly</em> kindly wrote with the following, additional information.</p>
<p>At <em>CJP</em>, the author’s identity is withheld from the editors while they decide whether to send the paper for external review, but then their identity is revealed (presumably to avoid inviting referees who are unacceptably close to the author—e.g. those identical to the author).</p>
<p>And chairman of <em>Phil Quarterly</em>’s editorial board, Jessica Brown, writes:</p>
<blockquote>
<ol>
<li>the PQ is very aware of issues about the representation of women, unsurprisingly given that the editorial board consists of myself, Sarah Broadie and Sophie-Grace Chappell. We monitor data on submissions by women and papers accepted in the journal every year.</li>
<li>the PQ has for many years had fully anonymised processing including the point at which decisions on papers are made (i.e. accept, reject, R and R etc). So, when we make such decisions we have no idea of the identity of the author.</li>
<li><p>While in some years the data has concerned us, more recently the figures do look better which is encouraging:</p>
<ul>
<li>16-17: 25% declared female authored papers accepted; 16% submissions</li>
<li>15-16: 14% accepted; 15% submissions</li>
<li>14-15: 16% accepted; 16% submissions</li>
</ul></li>
</ol>
</blockquote>
<h1 id="a-gruesome-conclusion">A Gruesome Conclusion</h1>
<p>In the end, I don’t see a clear lesson here. Before drawing any conclusions from the aggregated, cross-journal tallies, it seems we’d need to know more about the policies and practices of the journals driving them. Otherwise we’re liable to be misled to a false generalization about a heterogeneous group.</p>
<p>Some of that policy-and-practice information is probably publicly available; I haven’t had a chance to look. And I bet a lot of it is available informally, if you just talk to the right people. So this data-set could still be informative on our base-rate question. But sadly, I don’t think I’m currently in a position to make informative use of it.</p>
<p><img src="http://i.imgur.com/ojvPBaY.jpg" alt="" /></p>
<h1 id="technical-note">Technical Note</h1>
<p>This post was written in R Markdown and the source is <a href="https://github.com/jweisber/rgo/blob/master/apa bpa data/apa_bpa_data.Rmd" target="_blank">available on GitHub</a>.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">No, I don’t mean <em>Ergo</em>! We published our first issue in 2014 while the survey covers mainly 2011–13.
<a class="footnote-return" href="#fnref:1"><sup>[return]</sup></a></li>
<li id="fn:2"><strong>Edit</strong>: the parity line was solid blue originally. But that misled some people into reading it as a fitted line. For reference and posterity, <a href="http://jonathanweisberg.org/img/apa_bpa_data_files/unnamed-chunk-3-2.png">the original image is here</a>.
<a class="footnote-return" href="#fnref:2"><sup>[return]</sup></a></li>
<li id="fn:3">That’s if we continue to exclude journals with very round numbers. Adding these journals back in doesn’t change the following result, though.
<a class="footnote-return" href="#fnref:3"><sup>[return]</sup></a></li>
</ol>
</div>
Accuracy for Dummies, Part 6: Obtusity
http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%206%20-%20Obtusity/
Wed, 24 May 2017 00:00:00 -0500http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%206%20-%20Obtusity/
<p><a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 5 - Convexity/">Last time</a> we saw that the set of probability assignments is <em>convex</em>. Today we’re going to show that convex sets have a special sort of “obtuse” relationship with outsiders. Given a point <em>outside</em> a convex set, there is always a point <em>in</em> the set that forms a right-or-obtuse angle with it.</p>
<p>Recall our 2D diagram from the first post. The convex set of interest here is the diagonal line segment from $(0,1)$ to $(1,0)$:</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/2D Dominance Diagram - 400px.png" alt="" /></p>
<p>For any point outside the diagonal, like $c^* $, there is a point like $c’$ on it that forms a right angle with all other points on the diagonal. As a result, $c’$ is closer to all other points on the diagonal than $c^* $ is. In particular, $c’$ is closer to both vertices, so it’s always more accurate than $c^*$. It’s “closer to the truth”.</p>
<p>The insider point $c’$ that we used in this case is the closest point on the diagonal to $c^*$. That’s what licenses the right-triangle reasoning here. Today we’re generalizing this strategy to $n$ dimensions.</p>
<p>To do that, we need some tools for reasoning about $n$-dimensional geometry.$
\renewcommand{\vec}[1]{\mathbf{#1}}
\newcommand{\x}{\vec{x}}
\newcommand{\y}{\vec{y}}
\newcommand{\z}{\vec{z}}
\newcommand{\B}{B}
$</p>
<h1 id="arithmetic-with-arrows">Arithmetic with Arrows</h1>
<p>You’re familiar with arithmetic in one dimension: adding, subtracting, and multiplying single numbers. What about points in $n$ dimensions?</p>
<p>We introduced two ideas for arithmetic with points last time. We’ll add a few more today, and also talk about what they mean geometrically.</p>
<p>Suppose you have two points $\x$ and $\y$ in $n$ dimensions:
$$
\begin{align}
\x &= (x_1, \ldots, x_n),\\<br />
\y &= (y_1, \ldots, y_n).
\end{align}
$$
Their sum $\x + \y$, as we saw last time, is defined as follows:
$$ \x + \y = (x_1 + y_1, \ldots, x_n + y_n). $$
In other words, points are added coordinate-wise.</p>
<p>This definition has a natural, geometric meaning we didn’t mention last time. Start by thinking of $\x$ and $\y$ as <em>vectors</em>—as arrows pointing from the origin to the points $\x$ and $\y$. Then $\x + \y$ just amounts to putting the two arrows end-to-point and taking the point at the end:
<img src="http://jonathanweisberg.org/img/accuracy/VectorAddition.png" alt="" />
(Notice that we’re continuing our usual practice of bold letters for points/vectors like $\x$ and $\y$, and italics for single numbers like $x_1$ and $y_3$.)</p>
<p>You can also multiply a vector $\x$ by a single number, $a$. The definition is once again coordinate-wise:
$$ a \x = (a x_1, \ldots, a x_n). $$
And again there’s a natural, geometric meaning. We’ve lengthened the vector $\x$ by a factor of $a$.
<img src="http://jonathanweisberg.org/img/accuracy/VectorMultiplication.png" alt="" />
Notice that if $a$ is between $0$ and $1$, then “lengthening” is actually shortening. For example, multiplying a vector by $a = 1/ 2$ makes it half as long.</p>
<p>If $a$ is negative, then multiplying by $a$ reverses the direction of the arrow. For example, multiplying the northeasterly arrow $(1,1)$ by $-1$ yields the southwesterly arrow pointing to $(-1,-1)$.</p>
<p>That means we can define subtraction in terms of addition and multiplication by negative one (just as with single numbers):
$$
\begin{align}
\x - \y &= \x + (-1 \times \y)\\<br />
&= (x_1 - y_1, \ldots, x_n - y_n).
\end{align}
$$
So vector subtraction amounts to coordinate-wise subtraction.</p>
<p>But what about multiplying two vectors? That’s actually different from what you might expect! We don’t just multiply coordinate-wise. We do that <strong>and then add up the results</strong>:
$$ \x \cdot \y = x_1 y_1 + \ldots + x_n y_n. $$
So the product of two vectors is <strong>not a vector</strong>, but a number. That number is called the <em>dot product</em>, $\x \cdot \y$.</p>
<p>Why are dot products defined this way? Why do we add up the results of coordinate-wise multiplication to get a single number? Because it yields a more useful extension of the concept of multiplication from single numbers to vectors. We’ll see part of that in a moment, in the geometric meaning of the dot product.</p>
<p>(There’s an algebraic side to the story too, having to do with the axioms that characterize the real numbers—<a href="https://en.wikipedia.org/wiki/Field_(mathematics)" target="_blank">the field axioms</a>. We won’t go into that, but it comes out in <a href="http://www.youtube.com/watch?v=63HpaUFEtXY&t=8m28s" target="_blank">this bit</a> of a beautiful lecture by Francis Su, especially around <a href="http://www.youtube.com/watch?v=63HpaUFEtXY&t=11m45s" target="_blank">the 11:45 mark</a>.)</p>
<h1 id="signs-and-their-significance">Signs and Their Significance</h1>
<p>In two dimensions, a right angle has a special algebraic property: the dot-product of two arrows making the angle is always zero.</p>
<p>Imagine a right triangle at the origin, with one leg going up to the point $(0,1)$ and the other leg going out to $(1,0)$:
<img src="http://jonathanweisberg.org/img/accuracy/VectorRightAngle.png" alt="" />
The dot product of those two vectors is $(1,0) \cdot (0,1) = 1 \times 0 + 0 \times 1 = 0$. One more example: consider the right angle formed by the vectors $(-3,3)$ and $(1,1)$.
<img src="http://jonathanweisberg.org/img/accuracy/VectorRightAngle2.png" alt="" />
Again, the dot product is $(-3,3) \cdot (1,1) = -3 \times 1 + 3 \times 1 = 0.$</p>
<p>Going a bit further: the dot product is always positive for acute angles, and negative for obtuse angles. Take the vectors $(5,0)$ and $(-1,1)$:
<img src="http://jonathanweisberg.org/img/accuracy/VectorObtuseAngle.png" alt="" />
Then we have $(5,0) \cdot (-1,1) = -5$. Whereas for $(5,0)$ and $(1,1)$:
<img src="http://jonathanweisberg.org/img/accuracy/VectorAcuteAngle.png" alt="" />
we find $(5,0) \cdot (1,1) = 5$.</p>
<p>So the sign of the dot-product reflects the angle formed by the vectors $\x$ and $\y$:</p>
<ul>
<li>acute angle: $\x \cdot \y > 0$,</li>
<li>right angle: $\x \cdot \y = 0$,</li>
<li>obtuse angle: $\x \cdot \y < 0$.</li>
</ul>
<p>That’s going to be key in generalizing to $n$ dimensions, where reasoning with diagrams breaks down. But first, one last bit of groundwork.</p>
<h1 id="algebra-with-arrows">Algebra with Arrows</h1>
<p>You can check pretty easily that vector addition and multiplication behave a lot like ordinary addition and multiplication. The usual laws of commutativity, associativity, and distribution hold:</p>
<ul>
<li>$\x + \y = \y + \x$.</li>
<li>$\x + (\y + \z) = (\x + \y) + \z$.</li>
<li>$a ( \x + \y) = a\x + a\y$.</li>
<li>$\x \cdot \y = \y \cdot \x$.</li>
<li>$\x \cdot (\y + \z) = \x\y + \x\z$.</li>
<li>$a (\x \cdot \y) = a \x \cdot \y = \x \cdot a \y$.</li>
</ul>
<p>One notable consequence, which we’ll use below, is the analogue of the familiar <a href="https://en.wikipedia.org/wiki/FOIL_method" target="_blank">“FOIL method”</a> from high school algebra:
$$
\begin{align}
(\x - \y)^2 &= (\x - \y) \cdot (\x - \y)\\<br />
&= \x^2 - 2 \x \cdot \y + \y^2.
\end{align}
$$
We’ll also make use of the fact that the Brier distance between $\x$ and $\y$ can be written $(\x - \y)^2$. Why?</p>
<p>Let’s write $\B(\x,\y)$ for the Brier distance between points $\x$ and $\y$. Recall the definition of Brier distance, which is just the square of Euclidean distance:
$$ \B(\x,\y) = (x_1 - y_1)^2 + (x_2 - y_2)^2 + \ldots + (x_n - y_n)^2. $$
Now consider that, thanks to our definition of vector subtraction:
$$ \x - \y = (x_1 - y_1, x_2 - y_2, \ldots, x_n - y_n). $$
And thanks to the definition of the dot product:
$$ (\x - \y) \cdot (\x - \y) = (x_1 - y_1)^2 + (x_2 - y_2)^2 + \ldots (x_n - y_n)^2. $$
So $\B(\x, \y) = (\x - \y) \cdot (\x - \y)$, in other words:
$$ \B(\x, \y) = (\x - y)^2. $$</p>
<h1 id="a-cute-lemma">A Cute Lemma</h1>
<p>Now we can prove the lemma that’s the aim of this post. For the intuitive idea, picture a convex set $S$ in the plane, like a pentagon. Then choose an arbitrary point $\x$ outside that set:
<img src="http://jonathanweisberg.org/img/accuracy/ObtusityLemma.png" alt="" />
Now trace a straight line from $\x$ to the closest point of the convex region, $\y$:
<img src="http://jonathanweisberg.org/img/accuracy/ObtusityLemma2.png" alt="" />
Finally, trace another straight line to any other point $\z$ of $S$:
<img src="http://jonathanweisberg.org/img/accuracy/ObtusityLemma3.png" alt="" />
No matter what point we choose for $\z$, the angle formed will either be right or obtuse. It cannot be acute.</p>
<p><strong>Lemma.</strong> Let $S$ be a convex set of points in $\mathbb{R}^n$. Let $\x \not \in S$, and let $\y \in S$ minimize $\B(\y, \x)$ as a function of $\y$ on the domain $S$. Then for any $\z \in S$,
$$ (\x - \y) \cdot (\z - \y) \leq 0. $$</p>
<p>Let’s pause to understand what the Lemma is saying before we dive into the proof.</p>
<p>Focus on the centered inequality. It’s about the vectors $\x - \y$ and $\z - \y$. These are the arrows pointing from $\y$ to $\x$, and from $\y$ to $\z$. So in terms of our original two dimensional diagram with the triangle:
<img src="http://jonathanweisberg.org/img/accuracy/2D Dominance Diagram - 400px.png" alt="" />
we’re looking at the angle between $c^*$, $c’$, and any point on the diagonal you like… which includes the ones we’re especially interested in, the vertices. What the lemma tells us is that this angle is always at least a right angle.</p>
<p>Of course, it’s exactly a right angle in this case, not an obtuse one. That’s because our convex region is just the diagonal line. But the Lemma could also be applied to the whole triangular region in the diagram. That’s a convex set too. And if we took a point inside the triangle as our third point, the angle formed would be obtuse. (This is actually important if you want to generalize the dominance theorem beyond what we’ll prove next time. But for us it’s just a mathematical extra.)</p>
<p>Now let’s prove the Lemma.</p>
<p><em>Proof.</em> Because $S$ is convex and $\y$ and $\z$ are in $S$, any mixture of $\y$ and $\z$ must also be in $S$. That is, every point $\lambda \z + (1-\lambda) \y$ is in $S$, given $0 \leq \lambda \leq 1$.</p>
<p>Notice that we can rewrite $\lambda \z + (1-\lambda) \y$ as follows:
$$ \lambda \z + (1-\lambda) \y = \y + \lambda(\z - \y). $$
We’ll use this fact momentarily.</p>
<p>Now, by hypothesis $\y$ is at least as close to $\x$ as any other point of $S$ is. So, in particular, $\y$ is at least as close to $\x$ as the mixtures of $\y$ and $\z$ are. Thus, for any given $\lambda \in [0,1]$:
$$ \B(\y,\x) \leq \B(\lambda \z + (1-\lambda) \y, \x). $$
Using algebra, we can transform the right-hand side as follows:
$$
\begin{align}
\B(\lambda \z + (1-\lambda) \y, \x) &= \B(\x, \lambda \z + (1-\lambda) \y)\\<br />
&= \B(\x, \y + \lambda(\z - \y))\\<br />
&= (\x - (\y + \lambda(\z - \y)))^2\\<br />
&= ((\x - \y) - \lambda(\z - \y))^2\\<br />
&= (\x - \y)^2 + \lambda^2(\z - \y)^2 - 2\lambda(\x - \y) \cdot (\z - \y)\\<br />
&= \B(\x,\y) + \lambda^2\B(\z,\y) - 2\lambda(\x - \y) \cdot (\z - \y).
\end{align}
$$
Combining this equation with the previous inequality, we have:
$$ \B(\y,\x) \leq \B(\x,\y) + \lambda^2\B(\z,\y) - 2\lambda(\x - \y) \cdot (\z - \y). $$
And because $\B(\y, \x) = \B(\x, \y)$, this becomes:<br />
$$ 0 \leq \lambda^2\B(\z,\y) - 2\lambda(\x - \y) \cdot (\z - \y). $$
If we then restrict our attention to $\lambda > 0$, we can divide and rearrange terms to get:
$$ (\x - \y) \cdot (\z - \y) \leq \frac{\lambda\B(\z,\y)}{2}. $$
And since this inequality holds no matter how small $\lambda$ is, it follows that
$$ (\x - \y) \cdot (\z - \y) \leq 0, $$
as desired.
<span class="floatright">$\Box$</span></p>
<h1 id="taking-stock">Taking Stock</h1>
<p>Here’s what we’ve got from this post and the last one:</p>
<ul>
<li>Last time: the set of probability functions $P$ is convex.</li>
<li>This time: given a point $\x$ outside $P$, there’s a point $\y$ inside $P$ that forms a right-or-obtuse angle with every other point $\z$ in $P$.</li>
</ul>
<p>Intuitively, it should follow that:</p>
<ul>
<li>$\y$ is closer to every $\z$ in $P$ than $\x$ is.</li>
</ul>
<p>And indeed, that’s what we’ll show in the next post!</p>
Accuracy for Dummies, Part 5: Convexity
http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%205%20-%20Convexity/
Thu, 18 May 2017 10:35:00 -0500http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%205%20-%20Convexity/
<p>In this and the next two posts we’ll establish the central theorem of the accuracy framework. We’ll show that the laws of probability are specially suited to the pursuit of accuracy, measured in Brier distance.</p>
<p>We showed this for cases with two possible outcomes, like a coin toss, way back in <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 1/">the first post of this series</a>. A simple, <a href="http://jonathanweisberg.org/img/accuracy/2D Dominance Diagram - 400px.png">two-dimensional diagram</a> was all we really needed for that argument. To see how the same idea extends to any number of dimensions, we need to generalize the key ingredients of that reasoning to $n$ dimensions.</p>
<p>This post supplies the first ingredient: the convexity theorem.</p>
<h1 id="convex-shapes">Convex Shapes</h1>
<p>Convex shapes are central to the accuracy framework because, in a way, the laws of probability have a convex shape. Hopefully that mystical pronouncement will make sense by the end of this post.</p>
<p>You probably know a convex shape when you see one. Circles, triangles, and octagons are convex; pentagrams and the state of Texas are not.</p>
<p>But what makes a convex shape convex? Roughly: <em>it contains all its connecting lines</em>. If you take any two points in a convex region and draw a line connecting them, the line will lie entirely inside that region.</p>
<p>But on a non-convex figure, you can find points whose connecting line leaves the figure’s boundary:</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/TexasLine.png" alt="" /></p>
<p>We want to take this idea beyond two dimensions, though. And for that, we need to generalize the idea of connecting lines. We need the concept of a “mixture”.</p>
<h2 id="pointy-arithmetic">Pointy Arithmetic</h2>
<p>In two dimensions it’s pretty easy to see that if you take some percentage of one point, and a complementary percentage of another point, you get a third point on the line between them.$
\renewcommand{\vec}[1]{\mathbf{#1}}
\newcommand{\p}{\vec{p}}
\newcommand{\q}{\vec{q}}
\newcommand{\r}{\vec{r}}
\newcommand{\v}{\vec{v}}
\newcommand{\R}{\mathbb{R}}
$</p>
<p>For example, if you take $1/ 2$ of $(0,0)$ and add it to $1/ 2$ of $(1,1)$, you get the point halfway between: $(1/ 2,1/ 2)$. That’s pretty intuitive geometrically:
<img src="http://jonathanweisberg.org/img/accuracy/Fig1.png" alt="" />
But we can capture the idea algebraically too:
$$
\begin{align}
1/ 2 \times (0,0) + 1/ 2 \times (1,1)
&= (0,0) + (1/ 2, 1/ 2)\\<br />
&= (1/ 2, 1/ 2).
\end{align}
$$</p>
<p>Likewise, if you add $3/10$ of $(0,0)$ to $7/10$ of $(1, 1)$, you get the point seven-tenths of the way in between, namely $(7/10, 7/10)$:
<img src="http://jonathanweisberg.org/img/accuracy/Fig2.png" alt="" />
In algebraic terms:
$$
\begin{align}
3/10 \times (0,0) + 7/10 \times (1,1)
&= (0,0) + (7/10, 7/10)\\<br />
&= (7/10, 7/10).
\end{align}
$$</p>
<p>Notice that we just introduced two rules for doing arithmetic with points. When multiplying a point $\p = (p_1, p_2)$ by a number $a$, we get:
$$ a \p = (a p_1, a p_2). $$
And when adding two points $\p = (p_1, p_2)$ and $\q = (q_1, q_2)$ together:
$$ \p + \q = (p_1 + q_1, p_2 + q_2). $$
In other words, multiplying a point by a single number works element-wise, and so does adding two points together.</p>
<p>We can generalize these ideas straightforwardly to any number of dimensions $n$. Given points $\p = (p_1, p_2, \ldots, p_n)$ and $\q = (q_1, q_2, \ldots, q_n)$, we can define:
$$ a \p = (a p_1, a p_2, \ldots, a p_n), $$
and
$$ \p + \q = (p_1 + q_1, p_2 + q_2, \ldots, p_n + q_n).$$
We’ll talk more about arithmetic with points next time. For now, these two definitions will do.</p>
<h2 id="mixtures">Mixtures</h2>
<p>Now back to connecting lines between points. The idea is that the straight line between $\p$ and $\q$ is the set of points we get by “mixing” some portion of $\p$ with some portion of $\q$.</p>
<p>We take some number $\lambda$ between $0$ and $1$, we multiply $\p$ by $\lambda$ and $\q$ by $1 - \lambda$, and we sum the results: $\lambda \p + (1-\lambda) \q$. The set of points you can obtain this way is the straight line between $\p$ and $\q$.</p>
<p>In fact, you can mix any number of points together. Given $m$ points $\q_1, \ldots, \q_m$, we can define their <em>mixture</em> as follows. Let $\lambda_1, \ldots \lambda_m$ be positive real numbers that sum to one. That is:</p>
<ul>
<li>$\lambda_i \geq 0$ for all $i$, and</li>
<li>$\lambda_1 + \lambda_2 + \ldots + \lambda_m = 1$.</li>
</ul>
<p>Then we multiply each $\q_i$ by the corresponding $\lambda_i$ and sum up:
$$ \p = \lambda_1 \q_1 + \ldots + \lambda_m \q_m. $$
The resulting point $\p$ is a <em>mixture</em> of the $\q_i$’s.</p>
<p>Now we can define the general notion of a <em>convex set</em> of points. A convex set is one where the mixture of any points in the set is also contained in the set. (A convex set is “closed under mixing”, you might say.)</p>
<h1 id="convex-hulls">Convex Hulls</h1>
<p>It turns out that the set of possible probability assignments is convex.</p>
<p>More than that, it’s the convex set generated by the possible truth-value assignments, in a certain way. It’s the “convex hull” of the possible truth-value assignments.</p>
<p>What in the world is a “convex hull”?</p>
<p>Imagine some points in the plane—the corners of a square, for example. Now imagine stretching a rubber band around those points and letting it snap tight. The shape you get is the square with those points as corners. And the set of points enclosed by the rubber band is a convex set. Take any two points inside the square, or on its boundary, and draw the straight line between them. The line will not leave the square.</p>
<p>Intuitively, the convex hull of a set of points in the plane is the set enclosed by the rubber band exercise. Formally, the convex hull of a set of points is the set of points that can be obtained from them as a mixture. (And this definition works in any number of dimensions.)</p>
<p>For example, any of the points in our square example can be obtained by taking a mixture of the vertices. Take the center of the square: it’s halfway between the bottom left and top right corners. To get something to the left of that we can mix in some of the top left corner (and correspondingly less of the top right). And so on.</p>
<p>Now imagine the rubber band exercise using the possible truth-value assignments, instead of the corners of a square. In two dimensions, those are the points $(0,1)$ and $(1,0)$. And when you let the band snap tight, you get the diagonal line connecting them. As we saw way back in <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 1/">our first post</a>, the points on that diagonal line are the possible probability assignments.</p>
<h1 id="peeking-ahead">Peeking Ahead</h1>
<p>We also saw that if you take any point <em>not</em> on that diagonal, the closest point on the diagonal forms a right angle. That’s what lets us do some basic geometric reasoning to see that there’s a point on the line that’s closer to both vertices than the point off the line:</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/2D Dominance Diagram - 400px.png" alt="" /></p>
<p>That fact about closest points and right angles is what’s going to enable us to generalize the argument beyond two dimensions. If you take any point not on a convex hull, there’s a point on the convex hull (namely the closest point) which forms a right (or obtuse) angle with the other points on the hull.</p>
<p>Consider the three dimensional case. The possible truth-value assignments are $(1,0,0)$, $(0,1,0)$, and $(0,0,1)$:
<img src="http://jonathanweisberg.org/img/accuracy/Three Vertices.png" alt="" />
And when you let a rubber band snap tight around them, it encloses the triangular surface connecting them:
<img src="http://jonathanweisberg.org/img/accuracy/Three Vertices with Hull.png" alt="" />
That’s the set of probability assignments for three outcomes.</p>
<p>Now take any point that’s not on that triangular surface. Drop a straight line to the closest point on the surface. Then draw another straight line from there to one of the triangle’s vertices. These two straight lines will form a right or obtuse angle. So the distance from the first, off-hull point to the vertex is further than the distance from the second, on-hull point to the vertex.</p>
<p>Essentially the same reasoning works in any number of dimensions. But to make it work, we need to do three things.</p>
<ol>
<li>Prove that the probability assignments always form a convex hull around the possible truth-value assignments.</li>
<li>Prove that any point outside a convex hull forms a right angle (or an obtuse angle) with any point on the hull.</li>
<li>Prove that the point off the hull is further from all the vertices than the closest point on the hull.</li>
</ol>
<p>This post is dedicated to the first item.</p>
<h1 id="the-convexity-theorem">The Convexity Theorem</h1>
<p>We’re going to prove that the set of possible probability assignments is the same as the convex hull of the possible truth-value assignments. First let’s get some notation in place.</p>
<h2 id="notation">Notation</h2>
<p>As usual $n$ is the number of possible outcomes under consideration. So each possible truth-value assignment is a point of $n$ coordinates, with a single $1$ and $0$ everywhere else. For example, if $n = 4$ then $(0, 0, 1, 0)$ represents the case where the third possibility obtains.</p>
<p>We’ll write $V$ for the set of all possible truth value assignments. And we’ll write $\v_1, \ldots, \v_n$ for the elements of $V$. The first element $\v_1$ has its $1$ in the first coordinate, $\v_2$ has its $1$ in the second coordinate, etc.</p>
<p>We’ll use a superscript $^+$ for the convex hull of a set. So $V^+$ is the convex hull of $V$. It’s the set of all points that can be obtained by mixing members of $V$.</p>
<p>Recall, a mixture is a point obtained by taking nonnegative real numbers $\lambda_1, \ldots, \lambda_n$ that sum to one, and multiplying each one against the corresponding $\v_i$ and then summing up:
$$ \lambda_1 \v_1 + \lambda_2 \v_2 + \ldots + \lambda_n \v_n. $$
So $V^+$ is the set of all points that can be obtained by this method. Each choice of values $\lambda_1, \ldots, \lambda_n$ generates a member of $V^+$. (To exclude one of the $\v_i$’s from a mixture, just set $\lambda_i = 0$.)</p>
<p>Finally, we’ll use $P$ for the set of all probability assignments. Recall: a probability assignment is a point of $n$ coordinates, where each coordinate is nonnegative, and all the coordinates together add up to one. That is, $\p = (p_1,\ldots,p_n)$ is a probability assignment just in case:</p>
<ul>
<li>$p_i \geq 0$ for all $i$, and</li>
<li>$p_1 + p_2 + \ldots + p_n = 1$.</li>
</ul>
<p>The set $P$ contains just those points $\p$ satisfying these two conditions.</p>
<h2 id="statement-and-proof">Statement and Proof</h2>
<p>In the notation just established, what we’re trying to show is that $V^+ = P$.</p>
<p><strong>Theorem.</strong> $V^+ = P$. That is, the convex hull of the possible truth-value assignments just is the set of possible probability assignments.</p>
<p><em>Proof.</em> Let’s first show that $V^+ \subseteq P$.</p>
<p>Notice that a truth-value assignment is also probability assignment. Its coordinates are always $1$ or $0$, so all coordinates are nonnegative. And since it has only a single coordinate with value $1$, its coordinates add up to $1$.</p>
<p>But we have to show that any mixture of truth-value assignments is also a probability assignment. So let $\lambda_1, \ldots, \lambda_n$ be nonnegative numbers that sum to $1$. If we multiply $\lambda_i$ against a truth-value assignment $\v_i$, we get a point with $0$ in every coordinate except the $i$-th coordinate, which has value $\lambda_i$. For example, $\lambda_3 \times (0, 0, 1, 0) = (0, 0, \lambda_3, 0)$. So the mixture that results from $\lambda_1, \ldots, \lambda_n$ is:
$$
\lambda_1 \v_1 + \lambda_2 \v_2 + \ldots \lambda_n \v_n = (\lambda_1, \lambda_2, \ldots, \lambda_n).
$$
And this mixture has coordinates that are all nonnegative and sum to $1$, by hypothesis. In other words, it is a probability assignment.</p>
<p>So we turn to showing that $P \subseteq V^+$. In other words, we want to show that every probability assignment can be obtained as a mixture of the $\v_i$’s.</p>
<p>So take an arbitrary probability assignment $\p \in P$, where $\p = (p_1, \ldots, p_n)$. Let the $\lambda_i$’s be the probabilities that $\p$ assigns to each $i$: $\lambda_1 = p_1$, $\lambda_2 = p_2$, and so on. Then, by the same logic as in the first part of the proof:
$$ \lambda_1 \v_1 + \ldots + \lambda_n \v_n = (p_1, \ldots, p_n). $$
In other words, $\p$ is a mixture of the possible truth-value assignments, where the weights in the mixture are just the probability values assigned by $\p$. <span style="float: right;">$\Box$</span></p>
<h1 id="up-next">Up Next</h1>
<p>We’ve established the first of the three items listed earlier. Next time we’ll establish the second: given a point outside a convex set, there’s always a point inside that forms a right or obtuse angle with any other point of the set. Then we’ll be just a few lines of algebra from the main result: the Brier dominance theorem!</p>
Journals as Ratings Agencies
http://jonathanweisberg.org/post/Journals%20as%20Ratings%20Agencies/
Thu, 30 Mar 2017 15:27:04 -0500http://jonathanweisberg.org/post/Journals%20as%20Ratings%20Agencies/
<p>Starting in July, philosophy’s two most prestigious journals won’t reject submitted papers anymore. Instead they’ll “grade” every submission, assigning a rating on the familiar letter-grade scale (A+, A, A-, B+, B, B-, etc.).</p>
<p>They will, in effect, become ratings agencies.</p>
<p>They’ll still publish papers. Those rated A- or higher can be published in the journal, if the authors want. Or they can seek another venue, if they think they can do better.</p>
<p>I just made that up. But imagine if it were true—especially if a bunch of journals did this. How would it change philosophy’s publication game?</p>
<p>Well we’d save a lot of wasted labour, for one thing. And we’d discourage frivolous submissions, for another.</p>
<h1 id="the-bad">The Bad</h1>
<p>Under the current arrangement, the system is sagging low under the weight of premature, mediocre, even low-quality submissions. (I’d say it’s even creaking and cracking.) Editors scrounge miserably for referees, and referees frantically churn out reports and recommendations, mostly for naught.</p>
<p>In a typical case, the editor rejects the submission and the referees’ reports are filed away in a database, never to be read again. Maybe the author makes substantial revisions, but very likely they don’t—especially if the paper’s main idea is the real limiting factor. The process repeats at another journal, often at several more journals. And in the end all the philosophical public sees is: accepted at <em>International Journal of Such & Such Studies</em>.</p>
<p>Of all the people who’ve read and assessed the paper by that point, only two have their assessments directly broadcast to the public. And even then, only the “two thumbs more-or-less up” part of the signal gets out.</p>
<p>Yet five, eight, or even ten people have weighed in on the paper by then. They’ve thought about its strengths and weaknesses, and they’ve generated valuable insights and assessments that could save others time and trouble. Yet only the handling editors and the authors get the direct benefit of that labour.</p>
<p>The current system even encourages authors to waste editors’ and referees’ time. Unless they’re in a rush, authors can start at the top of the journal-prestige hierarchy and work their way down. You don’t even have to perfect your paper before starting this incredibly inefficient process. With so many journals to try, you’ll basically get unlimited kicks at the can. So you might as well let the referees do your homework for you.</p>
<p>(This doesn’t apply to all authors, obviously. Some work in areas that severely limit their can-kicking. And many <em>are</em> in a rush, to get jobs and tenure.)</p>
<h1 id="the-good">The Good</h1>
<p>But, if a paper were publicly assigned a grade every place it was submitted, authors might be more realistic in deciding where to submit. They might also wait until their paper is truly ready for public consumption before imposing on editors and referees.</p>
<p>Readers would also benefit from seeing a paper’s transcript. Not only could it inform their decision about whether to read the paper, it could aid their sense of how its contribution is received by peers and experts.</p>
<p>Referees would also have better incentives, to take on referee work and to be more diligent about it. They would know that their labour would have a greater impact, and that their assessment would have a more lasting effect.</p>
<p>Editors could even limit submissions based on their grade-history, e.g. “no submissions already graded by two other journals”, or “no submissions with an average grade less than a B”. (Ideally, different journals would have different policies here, to allow some variety.)</p>
<h1 id="the-ugly">The Ugly</h1>
<p>Of course, several high-profile journals would have to take the lead to make this kind of thing happen. And there would have to be strong norms within the discipline about publicizing grades: requiring they be listed alongside the paper on CVs and websites, for example</p>
<p>And there would be costs.</p>
<p>Everybody has their favourite story about the groundbreaking paper that got rejected five times, but was finally published in <em>The Posh Journal of Philosophy Review</em>, and has since been cited a gajillion times. Such papers could be weighed down by having their grade-transcripts publicized. (On the plus side, we could have a new genre of great paper: the cult classic!)</p>
<p>Also, some authors have to rely on referee feedback more than others, because of their limited philosophical networks. They’d likely find their papers with longer, more checkered grade-transcripts, exacerbating an existing injustice.</p>
<p>And, in the end, the present proposal might only be a band-aid. If there really is an oversubmission problem in academic philosophy (as I suspect there is), it’s probably caused by increased pressure to publish—because jobs are scarce, and administrators demand it, for example. Turning journals into ratings agencies wouldn’t relieve that pressure, even if it would help to manage some of its bad effects.</p>
<h1 id="decision-r-r">Decision: R&R</h1>
<p>In the end, I’m undecided about this proposal. I think it has some very attractive features, but the costs give me pause (much the same as the alternatives I’m aware of, like <a href="http://davidfaraci.com/populus" target="_blank">Populus</a>). I’m only certain that we can’t keep going as we have been; it won’t end well.</p>
Accuracy for Dummies, Part 4: Euclid in the Round
http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%204/
Thu, 23 Feb 2017 00:00:00 -0500http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%204/
<p>Last time we took Brier distance beyond two dimensions. <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 3/">We showed</a> that it’s “proper” in any finite number of dimensions. Today we’ll show that Euclidean distance is “improper” in any finite number dimensions.</p>
<p>When I first sat down to write this post, I had in mind a straightforward generalization of <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 1/">our previous result</a> for Euclidean distance in two dimensions. And I figured it would be easy to prove.</p>
<p>Not so.</p>
<p>My initial conjecture was false, and worse, when I asked my accuracy-guru friends for the truth, nobody seemed to know. (They did offer lots of helpful suggestions, though.)</p>
<p>So today we’re muddling through on our own even more than usual. Here goes.</p>
<h1 id="background">Background</h1>
<p>Let’s recall where we are. We’ve been considering different ways of measuring the inaccuracy of a probability assignment given a possibility, or a “possible world”.</p>
<p>Let’s start today by regimenting our terminology. We’ve used these terms semi-formally for a while now. But let’s gather them here for reference, and to make them a little more precise.$
\renewcommand{\vec}[1]{\mathbf{#1}}
\newcommand{\p}{\vec{p}}
\newcommand{\q}{\vec{q}}
\newcommand{\u}{\vec{u}}
\newcommand{\EIpq}{EI_{\p}(\q)}
\newcommand{\EIpp}{EI_{\p}(\p)}
$</p>
<p>Given a number of dimensions $n$:</p>
<ul>
<li>A <em>probability assignment</em> $\p = (p_1, \ldots, p_n)$ is a vector of positive real numbers that sum to $1$.</li>
<li>A <em>possible world</em> is a vector $\u$ of length $n$ containing all zeros except for a single $1$. (A <a href="https://en.wikipedia.org/wiki/Unit_vector" target="_blank">unit vector</a> of length $n$, in other words.)</li>
<li>A <em>measure of inaccuracy</em> $D(\p, \u)$ is a function that takes a probability assignment and a possible world and returns a real number.</li>
</ul>
<p>We’ve been considering two measures of inaccuracy. The first is the familiar Euclidean distance between $\p$ and $\u$. For example, when $\u = (1, 0, \ldots, 0)$ we have:
$$ \sqrt{(p_1 - 1)^2 + (p_2 - 0)^2 + \ldots + (p_n - 0)^2}.$$
The second way of measuring inaccuracy is less familiar, Brier distance, which is just the square of Euclidean distance:
$$ (p_1 - 1)^2 + (p_2 - 0)^2 + \ldots + (p_n - 0)^2.$$</p>
<p>What we found in $n = 2$ dimensions is that Euclidean distance is “unstable” in a way that Brier is not. If we measure inaccuracy using Euclidean distance, a probability assignment can expect some <em>other</em> probability assignment to do better accuracy-wise, i.e. to have lower inaccuracy.</p>
<p>In fact, given almost any probability assignment, the way to minimize expected inaccuracy is to leap to certainty in the most likely possibility. Given $(2/3, 1/3)$, for example, the way to minimize expected inaccuracy is to move to $(1,0)$.</p>
<p>Because Euclidean distance is unstable in this way, it’s called an “improper” measure of inaccuracy. So, two more bits of terminology:</p>
<ul>
<li>Given a probability assignment $\p$ and a measure of inaccuracy $D$, the <em>expected inaccuracy</em> of probability assignment $\q$, written $\EIpq$, is the weighted sum:
$$
\EIpq = p_1 D(\q,\u_1) + \ldots + p_n D(\q,\u_n),
$$
where $\u_i$ is the possible world with a $1$ at index $i$.</li>
<li>A measure of inaccuracy $D$ is <em>improper</em> if there is a probability assignment $\p$ such that for some assignment $\q \neq \p$, $\EIpq < \EIpp$ when inaccuracy is measured according to $D$.</li>
</ul>
<p>Last time we showed that Brier is <em>proper</em> in any finite number of dimensions $n$. Today our main task is to show that Euclidean distance is <em><strong>im</strong>proper</em> in any finite number of dimensions $n$.</p>
<p>But first, let’s get a tempting mistake out of the way.</p>
<h1 id="a-conjecture-and-its-refutation">A Conjecture and Its Refutation</h1>
<p>In <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 1/">our first post</a>, we saw that Euclidean distance isn’t just improper in two dimensions. It’s also <em>extremizing</em>: the assignment $(2/3, 1/3)$ doesn’t just expect <em>some</em> other assignment to do better accuracy-wise. It expects the assignment $(1,0)$ to do best!</p>
<p>At first I thought we’d be proving a straightforward generalization of that result today:</p>
<p><strong>Conjecture 1 (False).</strong> Let $(p_1, \ldots, p_n)$ be a probability assignment with a unique largest element $p_i$. If we measure inaccuracy by Euclidean distance, then $\EIpq$ is minimized when $\q = \u_i$.</p>
<p>Intuitively: expected inaccuracy is minimized by leaping to certainty in the most probable possibility. Turns out this is false in three dimensions. Here’s a</p>
<p><strong>Counterexample.</strong> Let’s define:
$$
\begin{align}
\p &= (5/12, 4/12, 3/12),\\<br />
\p’ &= (6/12, 4/12, 2/12),\\<br />
\u_1 &= (1, 0, 0).
\end{align}
$$</p>
<p>Then we can calculate (or better, <a href="https://github.com/jweisber/a4d/blob/master/Euclid%20in%20the%20Round.nb" target="_blank">have <em>Mathematica</em> calculate</a>):
$$
\begin{align}
\EIpp &\approx .804,\\<br />
EI_{\p}(\p’) &\approx .800,\\<br />
EI_{\p}(\u_1) &\approx .825.
\end{align}
$$
In this case $\EIpp < EI_{\p}(\u_1)$. So leaping to certainty doesn’t minimize expected inaccuracy (as measured by Euclidean distance).</p>
<p>Of course, staying put doesn’t minimize it either, since $EI_{\p}(\p’) < \EIpp$.</p>
<p>So what <em>does</em> minimize it in this example? I asked <em>Mathematica</em> to minimize $\EIpq$ and got… nothing for days. Eventually I gave up waiting and asked instead for <a href="https://github.com/jweisber/a4d/blob/master/Euclid%20in%20the%20Round.nb" target="_blank">a numerical approximation of the minimum</a>. One second later I got:</p>
<p>$$EI_{\p}(0.575661, 0.250392, 0.173947) \approx 0.797432.$$</p>
<p>I have no idea what that is in more meaningful terms, I’m sorry to say. But at least we know it’s not anywhere near the extreme point $\u_1$ I conjectured at the outset. (See the <strong>Update</strong> at the end for a little more.)</p>
<h1 id="a-shortcut-and-its-shortcomings">A Shortcut and Its Shortcomings</h1>
<p>So I asked friends who do this kind of thing for a living how they handle the $n$-dimensional case. A couple of them suggested taking a shortcut around it!</p>
<blockquote>
<p>Look, you’ve already handled the two-dimensional case. And that’s just an instance of higher dimensional cases.</p>
<p>Take a probability assignment like (2/3, 1/3). We can also think of it as (2/3, 1/3, 0), or as (2/3, 0, 1/3, 0), etc.</p>
<p>No matter how many zeros we sprinkle around in there, the same thing is going to happen as in the two-dimensional case. Leaping to certainty in the 2/3 possibility will minimize expected inaccuracy. (Because possibilities with no probability make no difference to expected value calculations.)</p>
<p>So no matter how many dimensions we’re working in, there will always be <em>some</em> probability assignment where leaping to certainty minimizes expected inaccuracy. It just might have lots of zeros in it.</p>
<p>So Euclidean distance is, technically, improper in any finite number of dimensions.</p>
</blockquote>
<p>At first I thought that was good enough for philosophy. Though I still wanted to know how to handle “no zeros” cases for the mathematical clarity.</p>
<p>Then I realized there may be a philosophical reason to be dissatisfied with this shortcut. A lot of people endorse the <a href="http://philosophy.anu.edu.au/sites/default/files/Staying%20Regular.December%2028.2012.pdf" target="_blank">Regularity principle</a>: you should never assign zero probability to any possibility. For these people, the shortcut might be a dead end.</p>
<p>(Of course, maybe we shouldn’t embrace Regularity if we’re working in the accuracy framework. I won’t stop for that question here.)</p>
<h1 id="a-theorem-and-its-corollary">A Theorem and Its Corollary</h1>
<p>So let’s take the problem head on. We want to show that Euclidean distance is improper in $n > 2$ dimensions, even when there are “no zeros”. Two last bits of terminology:</p>
<ul>
<li>A probability assignment $(p_1, \ldots, p_n)$ is <em>regular</em> if $p_i > 0$ for all $i$.</li>
<li>A probability assignment $(p_1, \ldots, p_n)$ is <em>uniform</em> if $p_i = p_j$ for all $i,j$.</li>
</ul>
<p>So, for example, the assignment $(1/3, 1/3, 1/3)$ is both regular and uniform. Whereas the assignment $(2/5, 2/5, 1/5)$ is regular, but not uniform.</p>
<p>What we’ll show is that assignments like $(2/5, 2/5, 1/5)$ make Euclidean distance “unstable”: they expect some other assignment to do better, accuracy-wise. (Exactly which other assignment they’ll expect to do best isn’t always easy to say.)</p>
<p>(Though I try to keep the math in these posts as elementary as possible, this proof will use calculus. If you know a bit about derivatives, you should be fine. Technically we’ll use multi-variable calculus. But if you’ve worked with derivatives in single-variable calculus, that should be enough for the main ideas.)</p>
<p><strong>Theorem.</strong>
Let $\p = (p_1, \ldots, p_n)$ be a regular, non-uniform probability assignment. If accuracy is measured by Euclidean distance, then $EI_{\p}(\q)$ is not minimized when $\q = \p$.</p>
<p><em>Proof.</em>
Let $\p = (p_1, \ldots, p_n)$ be a regular and non-uniform probability assignment, and measure inaccuracy using Euclidean distance. Then:
$$
\begin{align}
EI_{\p}(\q) &= p_1 \sqrt{(q_1 - 1)^2 + \ldots + (q_n - 0)^2} + \ldots + p_n \sqrt{(q_1 - 0)^2 + \ldots + (q_n - 1)^2}\\<br />
&= p_1 \sqrt{(q_1 - 1)^2 + \ldots + q_n^2} + \ldots + p_n \sqrt{q_1^2 + \ldots + (q_n - 1)^2}
\end{align}
$$</p>
<p>The crux of our proof will be that the derivatives of this function are non-zero at the point $\q = \p$. Since the minimum of a function is always a <a href="https://en.wikipedia.org/wiki/Critical_point_(mathematics)" target="_blank">“critical point”</a>, that suffices to show that $\q = \p$ is not a minimum of $\EIpq$.</p>
<p>To start, we calculate the partial derivative of $\EIpq$ for an arbitrary $q_i$:
$$
\begin{align}
\frac{\partial}{\partial q_i} \EIpq
&=
\frac{\partial}{\partial q_i} \left( p_1 \sqrt{(q_1 - 1)^2 + \ldots + q_n^2} + \ldots + p_n \sqrt{q_1^2 + \ldots + (q_n - 1)^2} \right)\\<br />
&=
p_1 \frac{\partial}{\partial q_i} \sqrt{(q_1 - 1)^2 + \ldots + q_n^2} + \ldots + p_n \frac{\partial}{\partial q_i} \sqrt{q_1^2 + \ldots + (q_n - 1)^2}\\<br />
&= \quad
p_i \frac{q_i - 1}{\sqrt{(q_i - 1)^2 + \sum_{j \neq i} q_j^2}} + \sum_{j \neq i} p_j \frac{q_i}{\sqrt{(q_j - 1)^2 + \sum_{k \neq j} q_k^2}}\\<br />
&= \quad
\sum_{j \neq i} \frac{p_j q_i}{\sqrt{(q_j - 1)^2 + \sum_{k \neq j} q_k^2}} - \sum_{j \neq i} \frac{p_i q_j}{\sqrt{(q_i - 1)^2 + \sum_{j \neq i} q_j^2}}.
\end{align}
$$</p>
<p>Then we evaluate at $\q = \p$:
$$
\begin{align}
\frac{\partial}{\partial q_i} \EIpp
&= \sum_{j \neq i} \frac{p_i p_j}{\sqrt{(p_j - 1)^2 + \sum_{k \neq j} p_k^2}} - \sum_{j \neq i} \frac{p_i p_j}{\sqrt{(p_i - 1)^2 + \sum_{j \neq i} p_j^2}}
\end{align}
$$</p>
<p>Now, because $\p$ is not uniform, some of its elements are larger than others. And because it is finite, there is at least one largest element. When $p_i$ is one of these largest elements, then $\partial / \partial q_i \EIpp$ is negative.</p>
<p>Why?</p>
<p>In our equation for $\partial / \partial q_i \EIpp$, each positive term has a corresponding negative term whose numerator is identical. And when $p_i$ is a largest element of $\p$, the denominator of each negative term will never be larger, but will sometimes be smaller, than the denominator of its corresponding positive term. Subtracting $1$ from $p_i$ before squaring does more to reduce the sum of squares $p_i^2 + \sum_{j \neq i} p_j^2$ than subtracting $1$ from any smaller term would. It effectively removes the/a largest square from the sum and substitutes the smallest replacement. So the negative terms are never smaller, but are sometimes larger, than their positive counterparts.</p>
<p>If, on the other hand, $p_i$ is the one of the smallest elements, then $\partial / \partial q_i \EIpp$ is positive. For then the reverse argument applies: the denominator of each negative term will never be smaller and will sometimes be larger than the denominator of the corresponding positive term. So the negatives terms are never larger, but are sometimes smaller, than their positive counterparts.</p>
<p>We have shown that the partial derivates of $\EIpq$ are non-zero at the point $\q = \p$. Thus $\p$ is not a critical point of $\EIpq$, and hence cannot be a minimum of $\EIpq$. <span class="floatright">$\Box$</span></p>
<p><strong>Corollary.</strong> Euclidean distance is improper in any finite number of dimensions.</p>
<p><em>Proof.</em> This is just a slight restatement of our theorem. If $\q = \p$ is not a minimum of $\EIpq$, then there is some $\q \neq \p$ such that $\EIpq < \EIpp$. <span class="floatright">$\Box$</span></p>
<h1 id="conjectures-awaiting-refutations">Conjectures Awaiting Refutations</h1>
<p>Notice, we’ve also shown something a bit stronger. We showed that the slope of $\EIpq$ at the point $\q = \p$ is always negative in the direction of $\p$’s largest element(s), and positive in the direction of its smallest element(s). That means we can always reduce expected inaccuracy by taking some small quantity away from the/a smallest element of $\p$ and adding it to the/a largest element. In other words, we can always reduce expected inaccuracy by moving <em>some</em> way towards perfect certainty in the/a possibility that $\p$ rates most probable.</p>
<p>However, we <em>haven’t</em> shown that repeatedly minimizing expected inaccuracy will, eventually, lead to certainty in the/a possibility that was most probable to begin with. For one thing, we haven’t shown that moving towards certainty in this direction minimizes expected inaccuracy at each step. We’ve only shown that moving in this direction reduces it.</p>
<p>Still, I’m pretty sure a result along these lines holds. Tinkering in <em>Mathematica</em> strongly suggests that the following Conjectures are true in any finite number of dimensions $n$:</p>
<p><strong>Conjecture 2.</strong> If a probability assignment gives greater than $1/ 2$ probability to some possibility, then expected inaccuracy is minimized by assigning probability 1 to that possibility. (But see the <strong>Update</strong> below.)</p>
<p><strong>Conjecture 3.</strong> Given a non-uniform probability assignment, repeatedly minimizing expected inaccuracy will, within a finite number of steps, increase the probability of the/a possibility that was most probable initially beyond $1/ 2$.</p>
<p>If these conjectures hold, then there’s still a weak-ish sense in which Euclidean distance is “extremizing” in $n > 2$ dimensions. Given a non-uniform probability assignment, repeatedly minimizing expected inaccuracy will eventually lead to greater than $1/ 2$ probability in the/a possibility that was most probable to begin with. Then, minimizing inaccuracy will lead in a single step to certainty in that possibility.</p>
<p>Proving these conjectures would close much of the gap between the theorem we proved and the false conjecture I started with. If you’re interested, you can use <a href="https://github.com/jweisber/a4d/blob/master/Euclid%20in%20the%20Round.nb" target="_blank">this <em>Mathematica</em> notebook</a> to test them.</p>
<p><strong>Update: Mar. 6, 2017.</strong> Thanks to some excellent help from <a href="https://mathematics.stanford.edu/people/department-directory/name/jonathan-love/" target="_blank">Jonathan Love</a>, I’ve tweaked this post (and greatly simplified <a href="http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%203/">the previous one</a>).</p>
<p>I changed the counterexample to the false Conjecture 1, which used to be $\p = (3/7, 2/7, 2/7)$ and $\p’ = (4/7, 2/7, 1/7)$. That works fine, but it’s potentially misleading.</p>
<p>As Jonathan kindly pointed out, the minimum point then is something quite nice. It’s obtained by moving in the $x$-dimension from $3/7$ to $\sqrt{3/7}$, and correspondingly reducing the probability in the $y$ and $z$ dimensions in equal parts.</p>
<p>But, in general, moving to the square root of the largest $p_i$ (when there is one) doesn’t minimize $\EIpq$. Even in the special case where all the other elements in the vector are equal, this doesn’t generally work.</p>
<p>Jonathan did solve that special case, though, and he found at least one interesting result connected with Conjecture 2. There appear to be cases where $p_i < 1/ 2$ for all $i$, and yet $\EIpq$ is still minimized by going directly to the extreme. For example, $\p = (.465, .2675, .2675)$.</p>
Editorial Gravity
http://jonathanweisberg.org/post/Editorial%20Gravity/
Wed, 22 Feb 2017 10:44:10 -0500http://jonathanweisberg.org/post/Editorial%20Gravity/
<p>We’ve all been there. One referee is positive, the other negative, and the editor decides to reject the submission.</p>
<p>I’ve heard it said editors tend to be conservative given the recommendations of their referees. And that jibes with my experience as an author.</p>
<p>So is there anything to it—is “editorial gravity” a real thing? And if it is, how strong is its pull? Is there some magic function editors use to compute their decision based on the referees’ recommendations?</p>
<p>In this post I’ll consider how things shake out at <a href="http://www.ergophiljournal.org/" target="_blank"><em>Ergo</em></a>.</p>
<h1 id="decision-rules">Decision Rules</h1>
<p><em>Ergo</em> doesn’t have any rule about what an editor’s decision should be given the referees’ recommendations. In fact, we explicitly discourage our editors from relying on any such heuristic. Instead we encourage them to rely on their judgment about the submission’s merits, informed by the substance of the referees’ reports.</p>
<p>Still, maybe there’s some natural law of journal editing waiting to be discovered here, or some unwritten rule.</p>
<p>Referees choose from four possible recommendations at <em>Ergo</em>: Reject, Major Revisions, Minor Revisions, or Accept. Let’s consider four simple rules we might use to predict an editor’s decision, given the recommendations of their referees.</p>
<ol>
<li>Max: the editor follows the recommendation of the most positive referee. (Ha!)</li>
<li>Mean: the editor “splits the difference” between the referees’ recommendations.
<ul>
<li>Accept + Major Revisions â†’ Minor Revisions, for example.</li>
<li>When the difference is intermediate between possible decisions, we’ll stipulate that this rule “rounds down”.
<ul>
<li>Major Revisions + Minor Revisions â†’ Major Revisions, for example.</li>
</ul></li>
</ul></li>
<li>Min: the editor follows the recommendation of the most negative referee.</li>
<li>Less-than-Min: the editor’s decision is a step more negative than either of the referees’.
<ul>
<li>Major Revisions + Minor Revisions â†’ Reject, for example.</li>
<li>Except obviously that Reject + anything â†’ Reject.</li>
</ul></li>
</ol>
<p>Do any of these rules do a decent job of predicting editorial decisions? If so, which does best?</p>
<h1 id="a-test">A Test</h1>
<p>Let’s run the simplest test possible. We’ll go through the externally reviewed submissions in <em>Ergo</em>’s database and see how often each rule makes the correct prediction.</p>
<p><img src="http://jonathanweisberg.org/img/editorial_gravity_files/unnamed-chunk-2-1.png" alt="" /></p>
<p>Not only was Min the most accurate rule, its predictions were correct 85% of the time! (The sample size here is 233 submissions, by the way.) Apparently, editorial gravity is a real thing, at least at <em>Ergo</em>.</p>
<p>Of course, <em>Ergo</em> might be atypical here. It’s a new journal, and online-only with no regular publication schedule. So there’s some pressure to play it safe, and no incentive to accept papers in order to fill space.</p>
<p>But let’s suppose for a moment that <em>Ergo</em> is typical as far as editorial gravity goes. That raises some questions. Here are two.</p>
<h1 id="two-questions">Two Questions</h1>
<p>First question: can we improve on the Min rule? Is there a not-too-complicated heuristic that’s even more accurate?</p>
<p>Visualizing our data might help us spot any patterns. Typically there are two referees, so we can plot most submissions on a plane according to the referees’ recommendations. Then we can colour them according to the editor’s decision. Adding a little random jitter to make all the points visible:</p>
<p><img src="http://jonathanweisberg.org/img/editorial_gravity_files/unnamed-chunk-3-1.png" alt="" /></p>
<p>To my eye this looks a lot like the pattern of concentric-corners you’d expect from the Min rule. Though not exactly, especially when the two referees strongly disagree—the top-left and bottom-right corners of the plot. Still, other than treating cases of strong disagreement as a tossup, no simple way of improving on the Min rule jumps out at me.</p>
<p>Second question: if editorial gravity is a thing, is it a good thing or a bad thing?</p>
<p>I’ll leave that as an exercise for the reader.</p>
<h1 id="technical-note">Technical Note</h1>
<p>This post was written in R Markdown and the source code is <a href="https://github.com/jweisber/rgo/blob/master/editorial gravity/editorial gravity.Rmd" target="_blank">available on GitHub</a>.</p>
Gender & Journal Referees
http://jonathanweisberg.org/post/Referee%20Gender/
Mon, 20 Feb 2017 09:34:10 -0500http://jonathanweisberg.org/post/Referee%20Gender/
<p>We looked at author gender in <a href="http://jonathanweisberg.org/post/Author Gender/">a previous post</a>, today let’s consider referees. Does their gender have any predictive value?</p>
<p>Once again our discussion only covers men and women because we don’t have the data to support a deeper analysis.<sup class="footnote-ref" id="fnref:1"><a rel="footnote" href="#fn:1">1</a></sup></p>
<p>Using data from <a href="http://www.ergophiljournal.org/" target="_blank"><em>Ergo</em></a>, we’ll consider the following questions:</p>
<ol>
<li><em>Requests</em>. How are requests to referee distributed between men and women? Are men more likely to be invited, for example?</li>
<li><em>Responses</em>. Does gender inform a referee’s response to a request? Are women more likely to say ‘yes’, for example?</li>
<li><em>Response-speed</em>. Does gender inform how quickly a referee responds to an invitation (whether to agree or to decline)? Do men take longer to agree/decline an invitation, for example?</li>
<li><em>Completion-speed</em>. If a referee does agree to provide a report, does their gender inform how quickly they’ll complete that report? Do men and women tend to complete their reports in the same time-frame?</li>
<li><em>Recommendations</em>. Does gender inform how positive/negative a referee’s recommendation is? Are men and women equally likely to recommend that a submission be rejected, for example?</li>
<li><em>Influence</em>. Does a referee’s gender affect the influence of their recommendation on the editor’s decison? Are the recommendations of male referees more likely to be followed, for example?</li>
</ol>
<p>A quick overview of our data set: there are a total of 1526 referee-requests in <em>Ergo</em>’s database. But only 1394 are included in this analysis. I’ve excluded:</p>
<ol>
<li>Requests to review an invited resubmission, since these are a different sort of beast.</li>
<li>Pending requests and reports, since the data for these are incomplete.</li>
<li>A handfull of cases where the referee’s gender is either unknown, or doesn’t fit the male/female classification.</li>
</ol>
<h1 id="requests">Requests</h1>
<p>How are requests distributed between men and women? 322 of our 1394 requests went to women, or 23.1% (1072 went to men, or 76.9%).</p>
<p>How does this compare to the way men and women are represented in academic philosophy in general? Different sources and different subpopulations yield a range of estimates.</p>
<p>At the low end, we saw in <a href="http://jonathanweisberg.org/post/Author Gender/">an earlier post</a> that about 15.3% of <em>Ergo</em>’s submissions come from women. The PhilPapers survey yields a range from 16.2% (<a href="https://philpapers.org/surveys/demographics.pl" target="_blank">all respondents</a>) to 18.4% (<a href="https://philpapers.org/surveys/demographics.pl?affil=Target+faculty&survey=8" target="_blank">“target” faculty</a>). And sources cited in <a href="http://www.faculty.ucr.edu/~eschwitz/SchwitzPapers/WomenInPhil-160315b.pdf" target="_blank">Schwitzgebel & Jennings</a> estimate the percentage of women faculty in various English speaking countries at 23% for Australia, 24% for the U.K., and 19–26% for the U.S.</p>
<p>So we have a range of baseline estimates from 15% to 26%. For comparison, the 95% confidence interval around our 23.1% finding is (21%, 25.4%).</p>
<h1 id="responses">Responses</h1>
<p>Do men and women differ in their responses to these requests? Here are the raw numbers:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Agreed</th>
<th align="right">Declined / No Response / Canceled</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">101</td>
<td align="right">221</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">403</td>
<td align="right">669</td>
</tr>
</tbody>
</table>
<p>The final column calls for some explanation. I’m lumping togther several scenarios here: (i) the referee responds to decline the request, (ii) the referee never responds, (iii) the editors cancel the request because it was made in error. Unfortunately, these three scenarios are hard to distinguish based on the raw data. For example, sometimes a referee declines by email rather than via our online system, and the handling editor then cancels the request instead of marking it as “Declined”.</p>
<p>With that in mind, here are the proportions graphically:</p>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-6-1.png" alt="" /></p>
<p>Men agreed more often than women: approximately 38% vs. 31%. And this difference is statistically significant.<sup class="footnote-ref" id="fnref:0"><a rel="footnote" href="#fn:0">2</a></sup></p>
<p>Note that women and men accounted for about 20% and 80% of the “Agreed” responses, respectively. Whether this figure differs significantly from the gender makeup of “the general population” depends, as before, on the source and subpopulation we use for that estimate.</p>
<p>We saw that estimates of female representation ranged from roughly 15% to 26%. For comparison, the 95% confidence interval around our 20% finding is (16.8%, 23.8%).</p>
<h1 id="response-speed">Response-speed</h1>
<p>Do men and women differ in response-speed—in how quickly they respond to a referee request (whether to agree or to decline)?</p>
<p>The average response-time for women is 1.92 days, and for men it’s 1.58 days. This difference is not statistically significant.<sup class="footnote-ref" id="fnref:3"><a rel="footnote" href="#fn:3">3</a></sup></p>
<p>A boxplot likewise suggests that men and women have similar interquartile ranges:</p>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-9-1.png" alt="" /><!-- --></p>
<h1 id="completion-speed">Completion-speed</h1>
<p>What about completion-speed: is there any difference in how long men and women take to complete their reports?</p>
<p>Women took 27.6 days on average, while men took 23.8 days. This difference is statistically significant.<sup class="footnote-ref" id="fnref:4"><a rel="footnote" href="#fn:4">4</a></sup></p>
<p>Does that mean men are more likely to complete their reports on time? Not necessarily. Here’s a frequency polygram showing when reports were completed:</p>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-11-1.png" alt="" /><!-- --></p>
<p>The spike at the four-week mark corresponds to the standard due date. We ask referees to submit their reports within 28 days of the initial request.</p>
<p>It looks like men had a stronger tendency to complete their reports early. But were they more likely to complete them on time?</p>
<p>One way to tackle this question is to look at how completed reports accumulate with time (the <a href="https://en.wikipedia.org/wiki/Empirical_distribution_function" target="_blank">empirical cumulative distribution</a>):</p>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-12-1.png" alt="" /><!-- --></p>
<p>As expected, the plot shows that men completed their reports early with greater frequency. But it also looks like women and men converged around the four-week mark, when reports were due.</p>
<p>Another way of approaching the question is to classify reports as either “On Time” or “Late”, according to whether they were completed before Day 29.</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">On Time</th>
<th align="right">Late</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">50</td>
<td align="right">38</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">242</td>
<td align="right">121</td>
</tr>
</tbody>
</table>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-14-1.png" alt="" /><!-- --></p>
<p>A chi-square test of independence then finds no statistically significant difference.<sup class="footnote-ref" id="fnref:6"><a rel="footnote" href="#fn:6">5</a></sup></p>
<p>Apparently men and women differed in their tendency to be early, but not necessarily in their tendency to be on time.</p>
<h1 id="recommendations">Recommendations</h1>
<p>Did male and female referees differ in their recommendations to the editors?</p>
<p><em>Ergo</em> offers referees four recommendations to choose from. The raw numbers:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Reject</th>
<th align="right">Major Revisions</th>
<th align="right">Minor Revisions</th>
<th align="right">Accept</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">42</td>
<td align="right">29</td>
<td align="right">9</td>
<td align="right">8</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">154</td>
<td align="right">103</td>
<td align="right">61</td>
<td align="right">45</td>
</tr>
</tbody>
</table>
<p>In terms of frequencies:</p>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-16-1.png" alt="" /><!-- --></p>
<p>The differences here are not statistically significant according to a chi-square test of independence.<sup class="footnote-ref" id="fnref:5"><a rel="footnote" href="#fn:5">6</a></sup></p>
<h1 id="influence">Influence</h1>
<p>Does a referee’s gender affect whether the editor follows their recommendation? We can tackle this question a few different ways.</p>
<p>One way is to just tally up those cases where the editor’s decision was the same as the referee’s recommendation, and those where it was different.</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Same</th>
<th align="right">Different</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">51</td>
<td align="right">37</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">206</td>
<td align="right">157</td>
</tr>
</tbody>
</table>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-17-1.png" alt="" /><!-- --></p>
<p>Clearly there’s no statistically significant difference between male and female referees here.<sup class="footnote-ref" id="fnref:7"><a rel="footnote" href="#fn:7">7</a></sup></p>
<p>A second approach would be to assign numerical ranks to referees’ recommendations and editors’ decisions: Reject = 1, Major Revisions = 2, etc. Then we can consider how far the editor’s decision is from the referee’s recommendation. For example, a decision of Accept is 3 away from a recommendation of Reject, while a decision of Major Revisions is 2 away from a recommendation of Accept.</p>
<p>By this measure, the average distance between the referee’s recommendation and the editor’s decision was 0.57 for women and 0.56 for men—clearly not a statistically significant difference.<sup class="footnote-ref" id="fnref:8"><a rel="footnote" href="#fn:8">8</a></sup></p>
<h1 id="summary">Summary</h1>
<p>Men received more requests to referee than women, as expected given the well known gender imbalance in academic philosophy. The distribution of requests between men (76.9%) and women (23.1%) was in line with some estimates of the gender makeup of academic philosophy, though not all estimates.</p>
<p>Men were more likely to agree to a request (38% vs. 31%), a statistically significant difference. Women accounted for about 20% of the “Agreed” responses, however, consistent with most (but not all) estimates of the gender makeup of academic philosophy.</p>
<p>There was no statistically significant difference in response-speed, but there was in the speed with which reports were completed (23.8 days on average for men, 27.6 days for women). This difference appears to be due to a stronger tendency on the part of men to complete their reports early, though not necessarily a greater chance of meeting the deadline.</p>
<p>Finally, there was no statistically significant difference in the recommendations of male and female referees, or in editors’ uptake of those recommendations.</p>
<h1 id="technical-notes">Technical Notes</h1>
<p>This post was written in R Markdown and the source is <a href="https://github.com/jweisber/rgo/blob/master/referee%20gender/referee%20gender.Rmd" target="_blank">available on GitHub</a>. I’m new to both R and classical statistics, and this post is a learning exercise for me. So I encourage you to check the code and contact me with corrections.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">Unlike in the previous analysis of author gender, however, here we do have a few known cases where either (i) the referee identifies as neither male nor female, or (ii) they identify as something more specific, e.g. “transgender male” rather than just “male”. But these cases are still too few for statistical analysis.
<a class="footnote-return" href="#fnref:1"><sup>[return]</sup></a></li>
<li id="fn:0">$\chi^2$(1, <em>N</em> = 1394) = 3.89, <em>p</em> = 0.05.
<a class="footnote-return" href="#fnref:0"><sup>[return]</sup></a></li>
<li id="fn:3"><em>t</em>(437.43) = -1.63, <em>p</em> = 0.1
<a class="footnote-return" href="#fnref:3"><sup>[return]</sup></a></li>
<li id="fn:4"><em>t</em>(144.26) = -2.46, <em>p</em> = 0.02
<a class="footnote-return" href="#fnref:4"><sup>[return]</sup></a></li>
<li id="fn:6">$\chi^2$(1, <em>N</em> = 451) = 2.59, <em>p</em> = 0.11.
<a class="footnote-return" href="#fnref:6"><sup>[return]</sup></a></li>
<li id="fn:5">$\chi^2$(3, <em>N</em> = 451) = 3.6, <em>p</em> = 0.31.
<a class="footnote-return" href="#fnref:5"><sup>[return]</sup></a></li>
<li id="fn:7">$\chi^2$(1, <em>N</em> = 451) = 0.01, <em>p</em> = 0.93.
<a class="footnote-return" href="#fnref:7"><sup>[return]</sup></a></li>
<li id="fn:8"><em>t</em>(117.57) = 0.07, <em>p</em> = 0.95.
<a class="footnote-return" href="#fnref:8"><sup>[return]</sup></a></li>
</ol>
</div>
In Defense of Reviewer 2
http://jonathanweisberg.org/post/Reviewer%202/
Mon, 06 Feb 2017 10:36:10 -0500http://jonathanweisberg.org/post/Reviewer%202/
<p>Spare a thought for Reviewer 2, that much-maligned shade of academe. There’s even <a href="https://twitter.com/hashtag/reviewer2" target="_blank">a hashtag</a> dedicated to the joke:</p>
<p><blockquote class="twitter-tweet tw-align-center" data-lang="en"><p lang="en" dir="ltr">A rare glimpse of reviewer 2, seen here in their natural habitat <a href="https://t.co/lpT1BVhDCX">pic.twitter.com/lpT1BVhDCX</a></p>— Aidan McGlynn (@AidanMcGlynn) <a href="https://twitter.com/AidanMcGlynn/status/820647829446283264">January 15, 2017</a></blockquote>
<script async src="http://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>But is it just a joke? Order could easily matter here.</p>
<p>Referees invited later weren’t the editor’s first choice, after all. Maybe they’re less competent, less likely to appreciate your brilliant insights as an author. Or maybe they’re more likely to miss well-disguised flaws! Then we should expect Reviewer 2 to be the more <em>generous</em> one.</p>
<p>Come to think of it, we can order referees in other ways beside order-of-invite. We might order them according to who completes their report fastest, for example. And faster referees might be more careless, hence more dismissive. Or they might be less critical and thus more generous.</p>
<p>There’s a lot to consider. Let’s investigate, using <a href="http://www.ergophiljournal.org/" target="_blank"><em>Ergo</em></a>’s data, <a href="http://jonathanweisberg.org/tags/rgo/">as usual</a>.</p>
<h1 id="severity-generosity">Severity & Generosity</h1>
<p>Reviewer 2 is accused of a lot. It’s not just that their overall take is more severe; they also tend to miss the point. They’re irresponsible and superficial in their reading. And to the extent they do appreciate the author’s point, their objections are poorly thought out. What’s more, if they bother to demand revisions, their demands are unreasonable.</p>
<p>We can’t measure these things directly, of course. But we can estimate a referee’s generosity indirectly, using their recommendation to the editors as a proxy.</p>
<p><em>Ergo</em>’s referees choose from four possible recommendations: Reject, Major Revisions, Minor Revisions, and Accept. To estimate a referee’s generosity, we’ll assign these recommendations numerical ranks, from 1 (Reject) up through 4 (Accept).</p>
<p>The higher this number, the more generous the referee; the lower, the more severe.</p>
<h1 id="invite-order">Invite Order</h1>
<p>Is there any connection between the order in which referees are invited and their severity?</p>
<p>Usually an editor has to try a few people before they get two takers. So we can assign each potential referee an “invite rank”. The first person asked has rank 1, the second person asked has rank 2, and so on.</p>
<p>Is there a correlation between invite rank and severity?</p>
<p>Here’s a plot of invite rank (<em>x</em>-axis) and generosity (<em>y</em>-axis). (The points have non-integer heights because I’ve added some random <a href="http://r4ds.had.co.nz/data-visualisation.html#position-adjustments" target="_blank">“jitter”</a> to make them all visible. Otherwise you’d just see an uninformative grid.)</p>
<p><img src="http://jonathanweisberg.org/img/reviewer_2_files/unnamed-chunk-2-1.png" alt="" /></p>
<p>The blue curve shows the overall trend in the data.<sup class="footnote-ref" id="fnref:1"><a rel="footnote" href="#fn:1">1</a></sup> It’s basically flat all the way through, except at the far-right end where the data is too sparse to be informative.</p>
<p>We can also look at the classic measure of correlation known as <a href="https://en.wikipedia.org/wiki/Spearman's_rank_correlation_coefficient" target="_blank">Spearman’s rho</a>. The estimate is essentially 0 given our data ($r_s$ = 0.01).<sup class="footnote-ref" id="fnref:2"><a rel="footnote" href="#fn:2">2</a></sup></p>
<p>Evidently, invite-rank has no discernible impact on severity.</p>
<h1 id="speed">Speed</h1>
<p>But now let’s look at the speed with which a referee completes their report:</p>
<p><img src="http://jonathanweisberg.org/img/reviewer_2_files/unnamed-chunk-4-1.png" alt="" /></p>
<p>Here an upward trend is discernible. And our estimate of Spearman’s rho agrees: $r_s$ = 0.1, a small but non-trivial correlation.<sup class="footnote-ref" id="fnref:3"><a rel="footnote" href="#fn:3">3</a></sup></p>
<p>Apparently, referees who take longer tend to be more generous!</p>
<h1 id="my-take">My Take</h1>
<p>I find these results encouraging, for the most part.</p>
<p>It’s nice to know that an editor’s first choice for a referee is the same as their fifth, as far as how severe or generous they’re likely to be.</p>
<p>It’s also nice to know that the speed with which a referee completes their report doesn’t <em>hugely</em> inform heir severity.</p>
<p>One we might well worry that faster referees are unduly severe. But this worry is tempered by a few considerations.</p>
<p>For one thing, the effect we found is small enough that it could just be noise. It is detectable using tools like regression and significance testing, so it’s not to be dismissed out of hand. But we might also do well to heed the wisdom of <a href="https://xkcd.com/1725/" target="_blank">XKCD</a> here:</p>
<p><img src="https://imgs.xkcd.com/comics/linear_regression_2x.png" alt="" /></p>
<p>Even if the effect is real, though, it could be a good thing just as easily as a bad thing.</p>
<p>True, referees who work fast might be sloppy and dismissive. And those who take longer might feel guiltier and thus be unduly generous.</p>
<p>But maybe referees who are more on the ball are both more prompt and more apt to spot a submission’s flaws. Or (as my coeditor Franz Huber pointed out) manuscripts that should clearly be rejected might be easier to referee on average, hence faster.</p>
<p>It’s hard to know what to make of this effect, if it is an effect. Clearly, <a href="https://twitter.com/hashtag/moreresearchisneeded" target="_blank">#MoreResearchIsNeeded</a>.</p>
<h1 id="technical-notes">Technical Notes</h1>
<p>This post was written in R Markdown and the source is <a href="https://github.com/jweisber/rgo/blob/master/reviewer%202/reviewer%202.Rmd" target="_blank">available on GitHub</a>. I’m new to both R and statistics, and this post is a learning exercise for me. So I encourage you to check the code and contact me with corrections.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">Specifically, the blue curve is a regression curve using the <a href="https://en.wikipedia.org/wiki/Local_regression#Definition_of_a_LOESS_model" target="_blank">LOESS</a> method of fit.
<a class="footnote-return" href="#fnref:1"><sup>[return]</sup></a></li>
<li id="fn:2">A significance test of the null hypothesis $\rho_s$ = 0 yields <em>p</em> = 0.87.
<a class="footnote-return" href="#fnref:2"><sup>[return]</sup></a></li>
<li id="fn:3">Testing the null hypothesis $\rho_s$ = 0 yields <em>p</em> = 0.03.
<a class="footnote-return" href="#fnref:3"><sup>[return]</sup></a></li>
</ol>
</div>
http://jonathanweisberg.org/2/
Wed, 01 Feb 2017 10:44:10 -0500http://jonathanweisberg.org/2/<ul>
<li>Three Lectures in Formal Epistemology, NIP 2010: 1) <a href="http://jonathanweisberg.org/pdf/NIP - ULPs.pdf">Upper & Lower Probabilities</a>, 2) <a href="http://jonathanweisberg.org/pdf/NIP - DST.pdf">Dempster-Shafer Theory</a>, and 3) <a href="http://jonathanweisberg.org/pdf/NIP - Pollock.pdf">Pollock’s Theory of Defeasible Reasoning</a></li>
<li><a href="http://jonathanweisberg.org/pdf/C_R_and_SKvWS.pdf">Conditionalization Without Reflection</a> — An extended version of <a href="http://jonathanweisberg.org/pdf/C_R_and_SKv2.SP.pdf">Conditionalization, Reflection, and Self-Knowledge</a></li>
</ul>
Accuracy for Dummies, Part 3: Beyond the Second Dimension
http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%203/
Fri, 27 Jan 2017 00:00:00 -0500http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%203/
<p>Last time we saw why accuracy-mavens prefer Brier distance to Euclidean distance. But we did everything in two dimensions. That’s fine for a coin toss, with only two possibilities. But what if there are three doors and one of them has a prize behind it??</p>
<p>Don’t panic! Today we’re going to verify that Brier distance is still a proper way of measuring inaccuracy, even when there are more than two possibilities. (Next time we’ll talk about Euclidean distance with more than two possibilitie.)</p>
<p>Let’s start small, with just three possibilities. $\renewcommand{\vec}[1]{\mathbf{#1}}\newcommand{\p}{\vec{p}}\newcommand{\q}{\vec{q}}\newcommand{\v}{\vec{v}}\newcommand{\EIpq}{EI_{\p}(\q)}\newcommand{\EIpp}{EI_{\p}(\p)}$</p>
<h1 id="three-possibilities">Three Possibilities</h1>
<p>You’re on a game show; there are three doors; one has a prize behind it. The three possibilities are represented by the vertices $(1,0,0)$, $(0,1,0)$, and $(0,0,1)$:</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/Three Vertices.png" alt="" /></p>
<p>Your credences are given by some probability assignment $(p_1, p_2, p_3)$. It might be $(1/ 3, 1/ 3, 1/ 3)$ but it could be anything… $(7/ 10, 2/ 10, 1/ 10)$, for example.</p>
<p>In case you’re curious, here’s what the range of possible probability assignments looks like in graphical terms:</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/Three Vertices with Hull.png" alt="" /></p>
<p>The triangular surface is the three-dimensional analogue of the diagonal line in <a href="http://jonathanweisberg.org/img/accuracy/2D Dominance Diagram.png">the two-dimensional diagram</a> from our <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 1/">first post</a> in this series.</p>
<p>It’ll be handy to refer to points on this surface using single letters, like $\p$ for $(p_1, p_2, p_3)$. We’ll write these letters in bold, to distinguish a sequence of numbers like $\p$ from a single number like $p_1$. (In math-speak, $\p$ is a <em>vector</em> and $p_1$ is a <em>scalar</em>.)</p>
<p>Our job is to show that Brier distance is “proper” in three dimensions. Let’s recall what that means: given a point $\p$, the expected Brier distance (according to $\p$) of a point $\q = (q_1, q_2, q_3)$ from the three vertices is always smallest when $\q = \p$.</p>
<p>What does <em>that</em> mean?</p>
<p>Recall, the Brier distance from $\q$ to the vertex $(1, 0, 0)$ is:
$$
(q_1 - 1)^2 + (q_2 - 0)^2 + (q_3 - 0)^2
$$
Or, more succinctly:
$$
(q_1 - 1)^2 + q_2^2 + q_3^2
$$
So the <em>expected</em> Brier distance of $\q$ according to $\p$ weights each such sum by the probability $\p$ assigns to the corresponding vertex.
$$
\begin{align}
&\quad\quad p_1 \left( (q_1 - 1)^2 + q_2^2 + q_3^2 \right)\\<br />
&\quad + p_2 \left( q_1^2 + (q_2 - 1)^2 + q_3^2 \right)\\<br />
&\quad + p_3 \left( q_1^2 + q_2^2 + (q_3 - 1)^2 \right)
\end{align}
$$
We need to show that this quantity is smallest when $\q = \p$, i.e. when $q_1 = p_1$, $q_2 = p_2$, and $q_3 = p_3$.</p>
<h2 id="visualizing-expected-inaccuracy">Visualizing Expected Inaccuracy</h2>
<p>Let’s do some visualization. We’ll take a few examples of $\p$, and graph the expected inaccuracy of other possible points $\q$, using Brier distance to measure inaccuracy.</p>
<p>For example, suppose $\p = (1/ 3, 1/ 3, 1/ 3)$. Then the expected inaccuracy of each point $\q$ looks like this:</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/BrierEI3.png" alt="" /></p>
<p>The horizontal axes represent $q_1$ and $q_2$. The vertical axis represents expected inaccuracy.</p>
<p>Where’s $q_3$?? Not pictured! If we used all three visible dimensions for the elements of $\q$, we’d have nothing left to visualize expected inaccuracy. But $q_3$ is there implicitly. You can always get $q_3$ by calculating $1 - (q_1 + q_2)$, because $\q$ is a probability assignment. So we don’t actually need $q_3$ in the graph!</p>
<p>Now, the red dot is the lowest point on the surface: the smallest possible expected inaccuracy, according to $\p$. But where is that in terms of $q_1$ and $q_2$? Let’s look at the same graph from directly above:</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/BrierEI3-above.png" alt="" /></p>
<p>Hey! Looks like the red dot is located at $q_1 = 1/ 3$ and $q_2 = 1/ 3$, i.e. at $\q = (1/ 3, 1/ 3, 1/ 3)$. Also known as $\p$. So that’s promising: looks like expected inaccuracy is minimized when $\q = \p$, at least in this example.</p>
<p>Let’s do one more example, $\p = (6/ 10, 3/ 10, 1/ 10)$. Then the expected Brier distance of each point $\q$ looks like this:
<img src="http://jonathanweisberg.org/img/accuracy/BrierEI3-2.png" alt="" />
Or, taking the aerial view again:
<img src="http://jonathanweisberg.org/img/accuracy/BrierEI3-2above.png" alt="" />
Yep, looks like the red dot is located at $q_1 = 6/ 10$ and $q_2 = 3/ 10$, i.e. at $\q = (6/ 10, 3/ 10, 1/ 10)$, also known as $\p$. So, once again, it seems expected inaccuracy is minimized when $\q = \p$.</p>
<p>So let’s prove that that’s how it always is.</p>
<h2 id="a-proof">A Proof</h2>
<p>We’ll need a little notation: I’m going to write $\EIpq$ for the expected inaccuracy of point $\q$, according to $\p$.</p>
<p>Now recall our formula for expected inaccuracy:
$$
\begin{align}
\EIpq
&= \quad p_1 \left( (q_1 - 1)^2 + q_2^2 + q_3^2 \right)\\<br />
&\quad + p_2 \left( q_1^2 + (q_2 - 1)^2 + q_3^2 \right)\\<br />
&\quad + p_3 \left( q_1^2 + q_2^2 + (q_3 - 1)^2 \right).
\end{align}
$$
How do we find the point $\q$ that minimizes this mess?</p>
<p>Originally this post used some pretty tedious calculus. But thanks to a hot tip from <a href="https://mathematics.stanford.edu/people/department-directory/name/jonathan-love/" target="_blank">Jonathan Love</a>, we can get by just with algebra.</p>
<p>First we need to expand the squares in our big ugly sum:
$$
\begin{align}
\EIpq
&= \quad p_1 \left( q_1^2 - 2q_1 + 1 + q_2^2 + q_3^2 \right)\\<br />
&\quad + p_2 \left( q_1^2 + q_2^2 - 2q_2 + 1 + q_3^2 \right)\\<br />
&\quad + p_3 \left( q_1^2 + q_2^2 + q_3^2 - 2q_3 + 1 \right).
\end{align}
$$
Then we’ll gather some common terms and rearrange things:
$$
\begin{align}
\EIpq &= (p_1 + p_2 + p_3)\left(q_1^2 + q_2^2 + q_3^2 + 1 \right) - 2p_1q_1 - 2p_2q_2 - 2p_3q_3.\\<br />
\end{align}
$$
Since $p_1 + p_2 + p_3 = 1$, that simplifies to:
$$
\begin{align}
\EIpq &= q_1^2 + q_2^2 + q_3^2 + 1 - 2p_1q_1 - 2p_2q_2 - 2p_3q_3.\\<br />
\end{align}
$$</p>
<p>Now we’ll use <a href="https://mathematics.stanford.edu/people/department-directory/name/jonathan-love/" target="_blank">Jonathan</a>’s ingenious trick. We’re going to add $p_1^2 + p_2^2 + p_3^2 - 1$ to this expression, <em>which doesn’t change where the minimum occurs</em>. If you shift every point on a graph upwards by the same amount, the minimum is still in the same place. (Imagine everybody in the world grows by an inch overnight; the shortest person in the world is still the shortest, despite being an inch taller.)</p>
<p>Then, magically, we get an expression that factors into something tidy:
$$
\begin{align}
&\phantom{=}\phantom{=} p_1^2 + p_2^2 + p_3^2 + q_1^2 + q_2^2 + q_3^2 - 2p_1q_1 - 2p_2q_2 - 2p_3q_3\\<br />
&= (p_1 - q_1)^2 + (p_2 - q_2)^2 + (p_3 - q_3)^2.
\end{align}
$$
And not just tidy, but easy to minimize. It’s a sum of squares, and squares are never negative. So the smallest possible value is $0$, which occurs when all the squares are $0$, i.e. when $q_1 = p_1$, $q_2 = p_2$, and $q_3 = p_3$.</p>
<p>So, the minimum of $\EIpq$ occurs in the same place, namely when $\q = \p$!</p>
<h2 id="the-nth-dimension">The Nth Dimension</h2>
<p>Now we can use the same idea to generalize to any number of dimensions. Since the steps are essentially identical, I’ll keep it short and (I hope) sweet.</p>
<p><strong>Theorem.</strong>
Given a probability assignment $\p = (p_1, \ldots, p_n)$, if inaccuracy is measured using Brier distance, then $\EIpq$ is uniquely minimized when $\q = \p$.</p>
<p><em>Proof.</em>
Let $\p = (p_1, \ldots, p_n)$ be a probability assignment, and let $\EIpq$ be the expected inaccuracy according to $\p$ of probability assignment $\q = (q_1, \ldots, q_n)$, measured using Brier distance.</p>
<p>First we simplify our expression for $\EIpq$ using algebra:
$$
\begin{align}
\EIpq
&= \quad p_1 \left( (q_1 - 1)^2 + q_2^2 + \ldots + q_n^2 \right)\\<br />
&\quad + p_2 \left( q_1^2 + (q_2 - 1)^2 + \ldots + q_n^2 \right)\\<br />
&\quad\quad \vdots\\<br />
&\quad + p_n \left( q_1^2 + q_2^2 + \ldots + q_{n-1}^2 + (q_n - 1)^2 \right)\\<br />
&= (p_1 + \ldots + p_n)\left( q_1^2 + \ldots + q_n^2 + 1\right) - 2 p_1 q_1 - \ldots - 2 p_n q_n\\<br />
&= q_1^2 + \ldots + q_n^2 + 1 - 2 p_1 q_1 - \ldots - 2 p_n q_n.
\end{align}
$$
Now, because $p_1^2 + \ldots + p_n^2 - 1$ is a constant, adding it to $\EIpq$ doesn’t change where the minimum occurs. So we can minimize instead:
$$
\begin{align}
&\phantom{=}\phantom{=} p_1^2 + \ldots + p_n^2 + q_1^2 + \ldots + q_n^2 - 2 p_1 q_1 - \ldots - 2 p_n q_n\\<br />
&= (p_1 - q_1)^2 + \ldots + (p_n - q_n)^2.
\end{align}
$$
Being a sum of squares, the minimum value here cannot be less than $0$, which occurs when $\q = \p$. <span style="float: right;">$\Box$</span></p>
<h1 id="conclusion">Conclusion</h1>
<div class="text-center">
<object data="http://www.youtube.com/embed/MkmMxfCgewQ"
width="560" height="315" classboo="text-center"></object>
</div>
<p>So what did we learn? That Brier distance isn’t just “stable” in toy cases like a coin-toss. It’s also stable in toy cases with any finite number of outcomes.</p>
<p>No matter how many outcomes are under consideration, each probability assignment expects itself to do best at minimizing inaccuracy, if we use Brier distance to measure inaccuracy.</p>
<p>To go beyond toy cases, we’d have to extend this result to cases with infinite numbers of possibilities. And I haven’t even begun to think about how to do that.</p>
<p>Instead, next time we’ll look at what happens in $3+$ dimensions when we use Euclidean distance instead of Brier distance. And it’s actually kind of interesting! It turns out Euclidean distance is still improper in $3+$ dimensions, but not necessarily in the same way as in $2$ dimensions. More on that next time…</p>
Gender & Journal Submissions
http://jonathanweisberg.org/post/Author%20Gender/
Thu, 26 Jan 2017 10:36:10 -0500http://jonathanweisberg.org/post/Author%20Gender/
<p>Does an author’s gender affect the fate of their submission to an academic journal? It’s a big question, even if we restrict ourselves to philosophy journals.</p>
<p>But we can make a start by using <a href="http://www.ergophiljournal.org" target="_blank"><em>Ergo</em></a> as one data-point. I’ll examine two questions:</p>
<ul>
<li><p>Question 1: Does gender affect the decision rendered at <em>Ergo</em>? Are men more likely to have their papers accepted, for example?</p></li>
<li><p>Question 2: Does gender affect time-to-decision at <em>Ergo</em>? For example, do women have to wait longer on average for a decision?</p></li>
</ul>
<h1 id="background">Background</h1>
<p>Some important background and caveats before we begin:</p>
<ul>
<li><p>Our data set goes back to Feb. 11, 2015, when <em>Ergo</em> moved to its current online system for handling submissions. We do have records going back to Jun. 2013, when the journal launched. But integrating the data from the two systems is a programming hassle I haven’t faced up to yet.</p></li>
<li><p>We’ll exclude submissions that were withdrawn by the author before a decision could be rendered. Usually, when an author withdraws a submission, it’s so that they can resubmit a trivially-corrected manuscript five minutes later. So this data mostly just gets in the way.</p></li>
<li><p>We’ll also exclude submissions that were still under review as of Jan. 1, 2017, since the data there is incomplete.</p></li>
<li><p>The gender data we’ll be using was gathered manually by <em>Ergo</em>’s managing editors (me and Franz Huber). In most cases we didn’t know the author personally. So we did a quick google to see whether we could infer the author’s gender based on public information, like pronouns and/or pictures. When we weren’t confident that we could, we left their gender as “unknown”.</p></li>
<li><p>This analysis covers only men and women, because there haven’t yet been any cases where we could confidently infer that an author identified as another gender. And the “gender unknown” cases are too few for reliable statistical analysis.</p></li>
<li><p>Since we only have data for the gender of the submitting author, our analysis will overlook co-authors.</p></li>
</ul>
<p>With that in mind, a brief overview: our data set contains $696$ submissions over almost two years (Feb. 11, 2015 up to Jan. 1, 2017), but only $639$ of these are included in this analysis. The $52$ submissions that were in-progress as of Jan. 1, 2017, or were withdrawn by the author, have been excluded. Another $5$ cases where the author’s gender was unknown were also excluded.</p>
<h1 id="gender-decisions">Gender & Decisions</h1>
<p>Does an author’s gender affect the journal’s decision about whether their submission is accepted? We can slice this question a few different ways:</p>
<ol>
<li><p>Does gender affect the first-round decision to reject/accept/R&R?</p></li>
<li><p>Does gender affect the likelihood of desk-rejection, specifically?</p></li>
<li><p>Does gender affect the chance of converting an R&R into an accept?</p></li>
<li><p>Does gender affect the ultimate decision to accept/reject (whether via an intervening R&R or not)?</p></li>
</ol>
<p>The short answer to all these questions is: no, at least not in a statistically significant way. But there are some wrinkles. So let’s take each question in turn.</p>
<h2 id="first-round-decisions">First-Round Decisions</h2>
<p>Does gender affect the first-round decision to reject/accept/R&R?</p>
<p><em>Ergo</em> has two kinds of R&R, Major Revisions and Minor Revisions. Here are the raw numbers:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Reject</th>
<th align="right">Major Revisions</th>
<th align="right">Minor Revisions</th>
<th align="right">Accept</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">76</td>
<td align="right">10</td>
<td align="right">2</td>
<td align="right">0</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">438</td>
<td align="right">41</td>
<td align="right">15</td>
<td align="right">5</td>
</tr>
</tbody>
</table>
<p>Graphically, in terms of percentages:</p>
<p><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-4-1.png" alt="" /><!-- --></p>
<p>There are differences here, of course: women were asked to make major revisions more frequently than men, for example. And men received verdicts of minor revisions or outright acceptance more often than women.</p>
<p>Are these differences significant? They don’t look it from the bar graph. And a standard chi-square test agrees.<sup class="footnote-ref" id="fnref:1"><a rel="footnote" href="#fn:1">1</a></sup></p>
<h2 id="desk-rejections">Desk Rejections</h2>
<p>Things are a little more interesting if we separate out desk rejections from rejections-after-external-review. The raw numbers:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Desk Reject</th>
<th align="right">Non-desk Reject</th>
<th align="right">Major Revisions</th>
<th align="right">Minor Revisions</th>
<th align="right">Accept</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">61</td>
<td align="right">15</td>
<td align="right">10</td>
<td align="right">2</td>
<td align="right">0</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">311</td>
<td align="right">127</td>
<td align="right">41</td>
<td align="right">15</td>
<td align="right">5</td>
</tr>
</tbody>
</table>
<p>In terms of percentages for men and women:</p>
<p><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-9-1.png" alt="" /><!-- --></p>
<p>The differences here are more pronounced. For example, women had their submissions desk-rejected more frequently, a difference of about 8.5%.</p>
<p>But once again, the differences are not statistically significant according to the standard chi-square test.<sup class="footnote-ref" id="fnref:2"><a rel="footnote" href="#fn:2">2</a></sup></p>
<h2 id="ultimate-decisions">Ultimate Decisions</h2>
<p>What if we just consider a submission’s ultimate fate—whether it’s accepted or rejected in the end? Here the results are pretty clear:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Reject</th>
<th align="right">Accept</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">78</td>
<td align="right">5</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">450</td>
<td align="right">38</td>
</tr>
</tbody>
</table>
<p><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-13-1.png" alt="" /><!-- --></p>
<p>Pretty obviously there’s no significant difference, and a chi-square test agrees.<sup class="footnote-ref" id="fnref:3"><a rel="footnote" href="#fn:3">3</a></sup></p>
<h2 id="conversions">Conversions</h2>
<p>Our analysis so far suggests that men and women probably have about equal chance of converting an R&R into an accept. Looking at the numbers directly corroborates that thought:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Reject</th>
<th align="right">Accept</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">2</td>
<td align="right">5</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">12</td>
<td align="right">33</td>
</tr>
</tbody>
</table>
<p><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-16-1.png" alt="" /><!-- --></p>
<p>As before, a standard chi-square test agrees.<sup class="footnote-ref" id="fnref:4"><a rel="footnote" href="#fn:4">4</a></sup> Though, of course, the numbers here are small and shouldn’t be given too much weight.</p>
<h2 id="conclusion-so-far">Conclusion So Far</h2>
<p>None of the data so far yielded a significant difference between men and women. None even came particularly close (see the footnotes for the numerical details). So it seems the journal’s decisions are independent of gender, or nearly so.</p>
<h1 id="gender-time-to-decision">Gender & Time-to-Decision</h1>
<p>Authors don’t just care what decision is rendered, of course. They also care that decisions are made quickly. Can men and women expect similar wait-times?</p>
<p>The average time-to-decision is 23.3 days. But for men it’s 23.9 days while for women it’s only 19.6. This looks like a significant difference. And although it isn’t quite significant according to a standard $t$ test, it very nearly is.<sup class="footnote-ref" id="fnref:5"><a rel="footnote" href="#fn:5">5</a></sup></p>
<p>What might be going on here? Let’s look at the observed distributions for men and women:</p>
<p><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-19-1.png" alt="" /><!-- --></p>
<p>A striking difference is that there are so many more submissions from men than from women. But otherwise these distributions actually look quite similar. Each is a bimodal distribution with one peak for desk-rejections around one week, and another, smaller peak for externally reviewed submissions around six or seven weeks.</p>
<p>We noticed earlier that women had more desk-rejections by about 8.5%. And while that difference wasn’t statistically significant, it may still be what’s causing the almost-significant difference we see with time-to-decision (especially if men also have a few extra outliers, as seems to be the case).</p>
<p>To test this hypothesis, we can separate out desk-rejections and externally reviewed submissions. Graphically:</p>
<p><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-20-1.png" alt="" /><!-- --><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-20-2.png" alt="" /><!-- --></p>
<p>Aside from the raw numbers, the distributions for men and for women look very similar. And if we run separate $t$ tests for desk-rejections and for externally reviewed submissions, gender differences are no longer close to significance. For desk-rejections $p = 0.24$. And for externally reviewed submissions $p = 0.46$.</p>
<h1 id="conclusions">Conclusions</h1>
<p>Apparently an author’s gender has little or no effect on the content or speed of <em>Ergo</em>’s decision. I’d <em>like</em> to think this is a result of the journal’s <a href="http://www.ergophiljournal.org/review.html" target="_blank">strong commitment to triple-anonymous review</a>. But without data from other journals to make comparisons, we can’t really infer much about potential causes. And, of course, we can’t generalize to other journals with any confidence, either.</p>
<h1 id="technical-notes">Technical Notes</h1>
<p>This post was written in R Markdown and the source is <a href="https://github.com/jweisber/rgo/blob/master/author%20gender/author%20gender.Rmd" target="_blank">available on GitHub</a>. I’m new to both R and classical statistics, and this post is a learning exercise for me. So I encourage you to check the code and contact me with corrections.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1"><p>Specifically, $\chi^2(3, N = 587) = 1.89$, $p = 0.6$. This raises the question of power, and for a small effect size ($w = .1$) power is only about $0.51$. But it increases quickly to $0.99$ at $w = .2$.</p>
<p>Given the small numbers in some of the columns though, especially the Accept column, we might prefer a different test than $\chi^2$. The more precise $G$ test yields $p = 0.46$, still fairly large. And Fisher’s exact test yields $p = 0.72$.</p>
<p>We might also do an ordinal analysis, since decisions have a natural desirability ordering for authors: Accept > Minor Revisions > Major Revisions > Reject. We can test for a linear trend by assigning integer ranks from 4 down through 1 <a href="http://ca.wiley.com/WileyCDA/WileyTitle/productCd-0470463635.html" target="_blank">(Agresti 2007)</a>. A test of the <a href="https://onlinecourses.science.psu.edu/stat504/node/91" target="_blank">Mantel-Haenszel statistic</a> $M^2$ then yields $p = 0.82$.</p>
<a class="footnote-return" href="#fnref:1"><sup>[return]</sup></a></li>
<li id="fn:2"><p>Here we have $\chi^2(4, N = 587) = 4.64$, $p = 0.33$. As before, the power for a small effect ($w = .1$) is only middling, about 0.46, but increases quickly to near certainty ($0.98$) by $w = .2$.</p>
<p>Instead of $\chi^2$ we might again consider a $G$ test, which yields $p = 0.24$, or Fisher’s exact test which yields $p = 0.37$.</p>
<p>For an ordinal test using the ranking Desk Reject < Non-desk Reject < Major Revisions < etc., the Mantel-Haenszel statistic $M^2$ now yields $p = 0.39$.</p>
<a class="footnote-return" href="#fnref:2"><sup>[return]</sup></a></li>
<li id="fn:3">Here we have $\chi^2(1, N = 571) = 0.11$, $p = 0.74$.
<a class="footnote-return" href="#fnref:3"><sup>[return]</sup></a></li>
<li id="fn:4">$\chi^2(1, N = 52) = 0$, $p = 1$.
<a class="footnote-return" href="#fnref:4"><sup>[return]</sup></a></li>
<li id="fn:5">Specifically, $t(137.71) = 1.78$, $p = 0.08$. Although a $t$ test may not actually be the best choice here, since (as we’re about to see) the sampling distributions aren’t normal, but rather bimodal. Still, we can compare this result to non-parametric tests like Wilcoxon-Mann-Whitney ($p = 0.1$) or the bootstrap-$t$ ($p = 0.07$). These $p$-values don’t quite cross the customary $\alpha = .05$ threshold either, but they are still small.
<a class="footnote-return" href="#fnref:5"><sup>[return]</sup></a></li>
</ol>
</div>
Accuracy for Dummies, Part 2: from Euclid to Brier
http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%202/
Wed, 18 Jan 2017 00:00:00 -0500http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%202/
<p><a href="http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%201/">Last time</a> we saw that Euclidean distance is an “unstable” way of measuring inaccuracy. Given one assignment of probabilities, you’ll expect some other assignment to be more accurate (unless the first assignment is either perfectly certain or perfectly uncertain).</p>
<p>That’s why accuraticians don’t use good ol’ Euclidean distance.</p>
<p><img src="https://crookedrunbrewing.files.wordpress.com/2014/05/scientician.png?w=240" alt="Just ask this accuratician" /></p>
<p>Instead they use… well, there are lots of alternatives. But the closest thing to a standard one is <em>Brier distance</em>: the square of Euclidean distance.</p>
<p>Here’s Euclid’s formula for the distance between two points $(a, b)$ and $(c, d)$ in the plane:
$$ \sqrt{ (a - c)^2 + (b - d)^2 }. $$
And here’s Brier’s:
$$ (a - c)^2 + (b - d)^2. $$
So, to get from Euclid to Brier, you just take away the square root.</p>
<p>That makes a world of difference, it turns out. Brier distance isn’t unstable the way Euclidean distance is. But we’ll see that it’s enough like Euclidean distance to vindicate the argument for the laws of probability we began with last time.</p>
<p>But first, a fun fact.</p>
<h1 id="fun-fact">Fun Fact</h1>
<p>Brier distance comes from the world of weather forecasting. Glenn W. Brier worked for the U. S. Weather Bureau, and in <a href="http://docs.lib.noaa.gov/rescue/mwr/078/mwr-078-01-0001.pdf" target="_blank">a 1950 paper</a> he proposed his formula as a way of measuring how well a weather forecaster is doing at predicting the weather.</p>
<p>Suppose you say there’s a 70% chance of rain. If it does rain, you’re hardly wrong, but you’re not exactly right either. Brier suggested assessing a forecaster’s probabilities by taking the square of the difference from $1$ when it rains, and from $0$ when it doesn’t.</p>
<p>Well, actually, he proposed taking the <em>average</em> of those squares. But we’ll follow the recent philosophical literature and keep it simple: we’ll just use the sum of squares rather than its average.</p>
<p>Now on to the substance. Two facts about Brier distance make it useful as a replacement for Euclidean distance.</p>
<h1 id="euclid-and-brier-are-ordinally-equivalent">Euclid and Brier are Ordinally Equivalent</h1>
<p>First, Brier distance is <em>ordinally equivalent</em> to Euclidean distance. Meaning: whenever a distance is larger according to Euclid, it’s larger according to Brier too. And vice versa.</p>
<p>How do we know that? Because Brier is just Euclid squared, and squaring a larger number always results in a larger number (for positive numbers like distances, anyway). If $D$ is the distance from Toronto to the sun, and $d$ is the distance from Toronto to the moon, then $D^2 > d^2$. It’s further to the sun than to the moon, both in terms of Brier distance and Euclidean distance.</p>
<p>So, when we’re comparing distances from the truth, Brier distance behaves a lot like Euclidean distance. In particular, what we learned from our opening diagram about Euclidean distance holds for Brier distance, too.</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/2D%20Dominance%20Diagram%20-%20400px.png" alt="Opening diagram" /></p>
<p>Not only is $c’$ closer to both vertices than $c^*$ in Euclidean terms, it’s also closer in terms of Brier distance.</p>
<h1 id="brier-is-stable">Brier is Stable</h1>
<p>Second, Brier distance doesn’t lead to the kind of instability that made Euclidean distance problematic. To see why, let’s rerun our expected inaccuracy calculations from last time, but using Brier distance instead of Euclid.</p>
<p>Suppose your credences in Heads and Tails are $p$ and $1-p$. What’s the expected inaccuracy of having some credence $q$ in Heads, and $1-q$ in Tails?</p>
<p>Well, the Brier distance between $(q, 1-q)$ and $(1,0)$ is:
$$(q - 1)^2 + ((1-q) - 0)^2.$$
And the Brier distance between $(q, 1-q)$ and $(0,1)$ is:
$$(q - 0)^2 + ((1-q) - 1)^2.$$
We don’t know which of $(1,0)$ or $(0,1)$ is the “true” one. But we have assigned them the probabilities $p$ and $1-p$, respectively. So we can calculate the expected inaccuracy of $(q, 1-q)$, written $EI(q, 1-q)$:
$$
\begin{align}
EI(q, 1-q) &= p \left( (q - 1)^2 + ((1-q) - 0)^2 \right)\\<br />
&\quad + (1-p) \left( (q - 0)^2 + ((1-q) - 1)^2 \right)\\<br />
&= 2 p (1 - q)^2 + 2(1-p) q^2\\<br />
&= 2 p q^2 - 4pq + 2p + 2q^2 - 2pq^2\\<br />
&= 2q^2 - 4pq + 2p
\end{align}
$$
Now that last line might look like a mess. But it’s really just a quadratic equation, where the variable is $q$. Remember: we’re treating $p$ as a constant since that’s the credence you hold. And we’re looking at potential values of $q$ to see which ones minimize the quantity $EI(q, 1-q)$, given a fixed credence of $p$ in heads.</p>
<p>So which value of $q$ minimizes this quadratic formula? You might remember from algebra class that a quadratic equation of the form:
$$
ax^2 + bx + c
$$
is a parabola, with the bottom of the bowl located at $x = -b/2a$. (Or, if you know some calculus, you can take the derivative and set it equal to $0$. Since the derivative here is $2ax + b$, setting it equal to $0$ yields, again, $x = -b/2a$.)</p>
<p>In the case of our formula, we have $a = 2$ and $b = -4p$. So the minimum happens when $q = 4p/4 = p$. In other words, given credence $p$ in heads, expected inaccuracy is minimized by sticking with that same credence, i.e. assigning $q = p$.</p>
<p>So, to complement our result about Euclidean distance from last time, we have a</p>
<p><strong>Theorem.</strong> Suppose $p \in [0,1]$. Then, according to the probability assignment $(p, 1-p)$, the expected Brier distance of any alternative assignment $(q, 1-q)$ from the points $(1,0)$ and $(0,1)$ is uniquely minimized when $p = q$.</p>
<p><em>Proof.</em> Scroll up! <span style="float: right;">$\Box$</span></p>
<h1 id="proper-scoring-rules">Proper Scoring Rules</h1>
<p>When a measure of inaccuracy is stable like this, it’s called <em>proper</em> (or sometimes: <em>immodest</em>).</p>
<p>There are lots of other proper ways of measuring inaccuracy besides Brier. But Brier tends to be the default among philosophers writing in the accuracy framework, at least as a working example. Why?</p>
<p>My impression (though I’m no guru) is that it’s the default because:</p>
<ol>
<li>Brier is a lot like Euclidean distance, as we saw. So it’s easier and more intuitive to work with than some of the alternatives.</li>
<li>Brier tends to be representative of other proper/immodest rules. If you discover something philosophically interesting using Brier, there’s a good chance it holds for many other proper scoring rules.</li>
<li>Brier has other nice mathematical properties which, according to authors like Richard Pettigrew, make it The One True Measure of Inaccuracy. (It may have some odd features too, though: see <a href="http://m-phi.blogspot.ca/2015/03/a-strange-thing-about-brier-score.html" target="_blank">this post</a> by Brian Knab and Miriam Schoenfield, for example.)</li>
</ol>
<p>How does our starting argument for the laws of total probability fare if we use other proper scoring rules, besides Brier? Really well, it turns out!</p>
<p>The key fact our diagram illustrates doesn’t just hold for Euclidean distance and Brier distance. Speaking <em>very</em> loosely: it holds on any proper way of measuring distance (but do see sections 8 and 9 of <a href="https://philpapers.org/rec/JOYAAC" target="_blank">Joyce’s 2009</a> for the details before getting carried away with this generalization; or see Theorem 4.3.5 of <a href="https://global.oup.com/academic/product/accuracy-and-the-laws-of-credence-9780198732716" target="_blank">Pettigrew 2016</a>).</p>
<p>Proving that requires grinding through a good deal of math, though. So in these posts we’re going to stick with Brier distance, at least for a while.</p>
<h1 id="begging-the-question">Begging the Question?</h1>
<p>We started these posts with an illustration of an influential argument for the laws of probability. But we quickly switched to <em>assuming</em> those very same laws in the arguments that followed.</p>
<p>For example, to illustrate the instability of Euclidean distance, I chose a point on the diagonal of our diagram, $(.6, .4)$. And in the theorem that generalized that example, I assumed probabilistic assignments like $(p, 1-p)$ and $(q, 1-q)$, which add up to $1$.</p>
<p>So didn’t we beg the question when we motivated switching from Euclid to Brier?</p>
<p>To some extent: yes. We are assuming that reasonable ways of measuring inaccuracy can’t be so hostile to the laws of probability that they make almost all probability assignments unstable.</p>
<p>But also: no. We aren’t assuming that the laws of probability are absolute and inviolable, just that they’re reasonable <em>sometimes</em>. Euclidean distance would rule out probabilistic credences on pretty much all occasions. So it conflicts with the very modest thought that following the laws of probability is <em>occasionally</em> reasonable. So, even if you’re just a little bit open to the idea of probability theory, Euclidean distance will seem pretty unfriendly.<sup class="footnote-ref" id="fnref:1"><a rel="footnote" href="#fn:1">1</a></sup></p>
<p>Perhaps most importantly, though: the motivation I’ve given you here for moving from Euclid to Brier isn’t the official one you’ll find in an actual, bottom-up argument for probability theory, like <a href="https://richardpettigrew.wordpress.com/accuracy-book/" target="_blank">Richard Pettigrew’s</a>. His argument starts from a much more abstract place. He starts with axioms that any measure of inaccuracy must obey, and then narrows things down to Brier.</p>
<p>So there’s the official story and the unofficial story. This post gives you the unofficial story, to help you get started. Because the official story is often really hard to understand. Not only is the math way more abstract, but the philosophical motivations are often hard to suss out. Because—and this is just between you and me now—the people telling the official story actually started out with the unofficial story, and then worked backwards until they came up with an officially respectable story that doesn’t beg the question quite so obviously.</p>
<p>Ok, that’s unfair. Here’s a more even-handed (and better-informed) way of putting it, from <a href="http://ndpr.nd.edu/news/70705-accuracy-and-the-laws-of-credence/" target="_blank">Kenny Easwaran’s review</a> of Pettigrew’s book:</p>
<blockquote>
<p>Some philosophers have a vision of what they do as starting from unassailable premises, and giving an ironclad argument for a conclusion. However, I think we’ve all often seen cases where these arguments are weaker than they seem to the author, and with the benefit of a bit of distance, one can often recognize how the premises were in fact motivated by an attempt to justify the conclusion, which was chosen in advance. Pettigrew avoids the charade of pretending to have come up with the premises independently of recognizing that they lead to the conclusions of his arguments. Instead, he is open about having chosen target conclusions in advance […] and investigated what collection of potentially plausible principles about accuracy and epistemic decision theory will lead to those conclusions.</p>
</blockquote>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">This argument is essentially drawn from <a href="https://philpapers.org/rec/JOYAAC" target="_blank">(Joyce 2009)</a>.
<a class="footnote-return" href="#fnref:1"><sup>[return]</sup></a></li>
</ol>
</div>
The Thursday Conundrum
http://jonathanweisberg.org/post/The%20Thursday%20Conundrum/
Mon, 16 Jan 2017 10:14:00 -0500http://jonathanweisberg.org/post/The%20Thursday%20Conundrum/
<p>In <a href="http://jonathanweisberg.org/post/An%20Editors%20Favourite%20Days/">an earlier post</a> we saw that Mondays and Thursdays are good for editors, at least at <a href="http://www.ergophiljournal.org/" target="_blank"><em>Ergo</em></a>. Potential referees say yes more often when invited on these days. But why?</p>
<p>Mondays aren’t too puzzling. It’s the start of a new week, so people are fresh, and maybe just a little deluded about how productive the coming week will prove to be.</p>
<p>But Thursdays? They don’t seem especially special. I tried <a href="http://jonathanweisberg.org/post/An%20Editors%20Favourite%20Days/#theory">speculating <em>a priori</em></a> about what might be going on there. But it’d be nice to have a hypothesis that’s grounded in some data.</p>
<h1 id="virtual-mondays">Virtual Mondays?</h1>
<p>At first I thought it might be something subtle. Maybe the day the invite is sent isn’t as important as when the referee <em>responds</em>. An invitation sent on Thursday might not be answered until the following Monday. Whereas invites sent on Monday might tend to be answered the same day. Then Thursday would end up being a kind of virtual Monday, as far as referees responding to invites goes.</p>
<p>That didn’t seem to fit the data, though. For one thing, if you look at which days referees are least likely to <em>respond</em> negatively, it’s Mondays and Thursdays again:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-2-1.png" alt="" /><!-- --></p>
<p>For another, if you look at when referees respond to requests sent on Monday, it’s the same pattern as for requests sent on Thursday. In either case, referees typically respond the same day, or in the next couple of days. Here’s the pattern for Monday-invites:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-3-1.png" alt="" /><!-- -->
And here are Thursday-invites:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-4-1.png" alt="" /><!-- -->
In case you’re curious, here are all the days of the week, tiled according to day-of-invite:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-5-1.png" alt="" /><!-- -->
The pattern is pretty similar regardless of the day the invite is sent (except for the predictable effect of weekends, which tend to dampen responses accross the board).</p>
<h1 id="the-beleaguered">The Beleaguered</h1>
<p>So my current hypothesis is much more flat-footed: it’s mainly a matter of when referees are busy. Monday they’re feeling fresh from the weekend, as I suggested. But why would Thursday be less overwhelming for referees? Maybe because they get fewer invitations then.</p>
<p>Let’s test that hypothesis. Here are the total numbers of invites sent out each day of the week, over the last two years at <em>Ergo</em>:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-6-1.png" alt="" /><!-- -->
The overall pattern is pretty much what you’d expect. Weekends are quiet (because even editors have lives). Then things pick up on Monday and Tuesday as the workweek begins, before declining again as the week wears on.</p>
<p>Note the uptick from Thursday to Friday, though: a difference of about 30 invitations. On a scale ranging from ~100 to ~250, that may be a non-trivial difference. And the same pattern shows up in both years we have data for:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-7-1.png" alt="" /><!-- -->
So maybe Thursdays are good because things are quieting down as the weekend approaches, and referees are receiving fewer requests. But by Friday it’s too late. The editors of the world are scrambling to lock down referees before the weekend. And, probably, referees aren’t keen to clutter up their desks just as they’re about to go into a weekend break.</p>
<h1 id="how-is-a-raven-like-a-writing-desk">How is a Raven Like a Writing Desk?</h1>
<p>But if Thursdays are good because fewer requests go out then, shouldn’t Mondays be terrible? We just saw that the start of the week is the busiest time as far as number of requests sent to referees.</p>
<p>My guess is that Monday and Thursday are to be explained somewhat differently. Thursdays are distinguished by their quietude, whereas Mondays are marked by vim and vigour. People are fresh, as I said. But also, the onslaught of the week’s workload hasn’t really hit yet.</p>
<p>In support of this last hypothesis, notice that the following pattern is quite robust: weekends are quiet, followed by a burst of activity early in the week, followed by decline towards the next weekend. We saw this pattern with editors sending requests to referees. But we see it other places too.</p>
<p>For example, here’s how the quantity of submissions the journal receives varies over the week:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-8-1.png" alt="" /><!-- -->
And here are the numbers of referee reports completed each day:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-9-1.png" alt="" /><!-- -->
Pretty clearly, authors, editors, and referees are all quietest on the weekends, and most active at the week’s start. (We also see in this last graph, as with editors contacting referees, that there’s a feeble resurgence toward week’s end—presumably in an attempt to clear the docket before the weekend.)</p>
<h1 id="conclusion">Conclusion</h1>
<p>So here’s my theory, at least for now.</p>
<p>Referees are game on Mondays for the obvious reasons: they’ve had the weekend to recharge and catch up, and the onslaught of Monday’s and Tuesday’s new submissions—and the corresponding wave of invitations to referees—hasn’t reverberated out into the referee-verse just yet. (Not to mention other demands, like teaching.)</p>
<p>Referees are game on Thursdays, too, but for somewhat different reasons. As the week wears on, authors and editors wind down, so referees find fewer invites in their inboxes. They’ve also completed their existing assignments earlier in the week, maybe even submitted their own papers. So they’re game, until the next day, Friday, when editors do their last-minute, pre-weekend scramble—which is especially ill-timed since referees are switching out of work-mode anyway.</p>
<p>It’s a bit unlovely and disunified, this explanation. But not entirely. Mondays and Thursdays do have something in common on this story. They’re both days when things are calmer for referees, albeit calm in different ways and for somewhat different reasons.</p>
<h1 id="technical-notes">Technical Notes</h1>
<p>This post was written in R Markdown and the source is <a href="https://github.com/jweisber/rgo/blob/master/thursday%20conundrum/the%20thursday%20conundrum.Rmd" target="_blank">available on GitHub</a>. I’m new to R and data science, and this post is a learning exercise for me. So I encourage you to check the code and contact me with corrections.</p>