Jonathan Weisberg
http://jonathanweisberg.org/index.xml
Recent content on Jonathan WeisbergHugo -- gohugo.ioen-usThu, 23 Feb 2017 00:00:00 -0500Accuracy for Dummies, Part 4: Euclid in the Round
http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%204/
Thu, 23 Feb 2017 00:00:00 -0500http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%204/
<p>Last time we took Brier distance beyond two dimensions. <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 3/">We showed</a> that it’s “proper” in any finite number of dimensions. Today we’ll show that Euclidean distance is “improper” in any finite number dimensions.</p>
<p>When I first sat down to write this post, I had in mind a straightforward generalization of <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 1/">our previous result</a> for Euclidean distance in two dimensions. And I figured it would be easy to prove.</p>
<p>Not so.</p>
<p>My initial conjecture was false, and worse, when I asked my accuracy-guru friends for the truth, nobody seemed to know. (They did offer lots of helpful suggestions, though.)</p>
<p>So today we’re muddling through on our own even more than usual. Here goes.</p>
<h1 id="background">Background</h1>
<p>Let’s recall where we are. We’ve been considering different ways of measuring the inaccuracy of a probability assignment given a possibility, or a “possible world”.</p>
<p>Let’s start today by regimenting our terminology. We’ve used these terms semi-formally for a while now. But let’s gather them here for reference, and to make them a little more precise.$
\renewcommand{\vec}[1]{\mathbf{#1}}
\newcommand{\p}{\vec{p}}
\newcommand{\q}{\vec{q}}
\newcommand{\u}{\vec{u}}
\newcommand{\EIpq}{EI_{\p}(\q)}
\newcommand{\EIpp}{EI_{\p}(\p)}
$</p>
<p>Given a number of dimensions $n$:</p>
<ul>
<li>A <em>probability assignment</em> $\p = (p_1, \ldots, p_n)$ is a vector of positive real numbers that sum to $1$.</li>
<li>A <em>possible world</em> is a vector $\u$ of length $n$ containing all zeros except for a single $1$. (A <a href="https://en.wikipedia.org/wiki/Unit_vector" target="_blank">unit vector</a> of length $n$, in other words.)</li>
<li>A <em>measure of inaccuracy</em> $D(\p, \u)$ is a function that takes a probability assignment and a possible world and returns a real number.</li>
</ul>
<p>We’ve been considering two measures of inaccuracy. The first is the familiar Euclidean distance between $\p$ and $\u$. For example, when $\u = (1, 0, \ldots, 0)$ we have:
$$ \sqrt{(p_1 - 1)^2 + (p_2 - 0)^2 + \ldots + (p_n - 0)^2}.$$
The second way of measuring inaccuracy is less familiar, Brier distance, which is just the square of Euclidean distance:
$$ (p_1 - 1)^2 + (p_2 - 0)^2 + \ldots + (p_n - 0)^2.$$</p>
<p>What we found in $n = 2$ dimensions is that Euclidean distance is “unstable” in a way that Brier is not. If we measure inaccuracy using Euclidean distance, a probability assignment can expect some <em>other</em> probability assignment to do better accuracy-wise, i.e. to have lower inaccuracy.</p>
<p>In fact, given almost any probability assignment, the way to minimize expected inaccuracy is to leap to certainty in the most likely possibility. Given $(2/3, 1/3)$, for example, the way to minimize expected inaccuracy is to move to $(1,0)$.</p>
<p>Because Euclidean distance is unstable in this way, it’s called an “improper” measure of inaccuracy. So, two more bits of terminology:</p>
<ul>
<li>Given a probability assignment $\p$ and a measure of inaccuracy $D$, the <em>expected inaccuracy</em> of probability assignment $\q$, written $\EIpq$, is the weighted sum:
$$
\EIpq = p_1 D(\q,\u_1) + \ldots + p_n D(\q,\u_n),
$$
where $\u_i$ is the possible world with a $1$ at index $i$.</li>
<li>A measure of inaccuracy $D$ is <em>improper</em> if there is a probability assignment $\p$ such that for some assignment $\q \neq \p$, $\EIpq < \EIpp$ when inaccuracy is measured according to $D$.</li>
</ul>
<p>Last time we showed that Brier is <em>proper</em> in any finite number of dimensions $n$. Today our main task is to show that Euclidean distance is <em><strong>im</strong>proper</em> in any finite number of dimensions $n$.</p>
<p>But first, let’s get a tempting mistake out of the way.</p>
<h1 id="a-conjecture-and-its-refutation">A Conjecture and Its Refutation</h1>
<p>In <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 1/">our first post</a>, we saw that Euclidean distance isn’t just improper in two dimensions. It’s also <em>extremizing</em>: the assignment $(2/3, 1/3)$ doesn’t just expect <em>some</em> other assignment to do better accuracy-wise. It expects the assignment $(1,0)$ to do best!</p>
<p>At first I thought we’d be proving a straightforward generalization of that result today:</p>
<p><strong>Conjecture 1 (False).</strong> Let $(p_1, \ldots, p_n)$ be a probability assignment with a unique largest element $p_i$. If we measure inaccuracy by Euclidean distance, then $\EIpq$ is minimized when $\q = \u_i$.</p>
<p>Intuitively: expected inaccuracy is minimized by leaping to certainty in the most probable possibility. Turns out this is false in three dimensions. Here’s a</p>
<p><strong>Counterexample.</strong> Let’s define:
$$
\begin{align}
\p &= (5/12, 4/12, 3/12),\\<br />
\p’ &= (6/12, 4/12, 2/12),\\<br />
\u_1 &= (1, 0, 0).
\end{align}
$$</p>
<p>Then we can calculate (or better, <a href="https://github.com/jweisber/a4d/blob/master/Euclid%20in%20the%20Round.nb" target="_blank">have <em>Mathematica</em> calculate</a>):
$$
\begin{align}
\EIpp &\approx .804,\\<br />
EI_{\p}(\p’) &\approx .800,\\<br />
EI_{\p}(\u_1) &\approx .825.
\end{align}
$$
In this case $\EIpp < EI_{\p}(\u_1)$. So leaping to certainty doesn’t minimize expected inaccuracy (as measured by Euclidean distance).</p>
<p>Of course, staying put doesn’t minimize it either, since $EI_{\p}(\p’) < \EIpp$.</p>
<p>So what <em>does</em> minimize it in this example? I asked <em>Mathematica</em> to minimize $\EIpq$ and got… nothing for days. Eventually I gave up waiting and asked instead for <a href="https://github.com/jweisber/a4d/blob/master/Euclid%20in%20the%20Round.nb" target="_blank">a numerical approximation of the minimum</a>. One second later I got:</p>
<p>$$EI_{\p}(0.575661, 0.250392, 0.173947) \approx 0.797432.$$</p>
<p>I have no idea what that is in more meaningful terms, I’m sorry to say. But at least we know it’s not anywhere near the extreme point $\u_1$ I conjectured at the outset. (See the <strong>Update</strong> at the end for a little more.)</p>
<h1 id="a-shortcut-and-its-shortcomings">A Shortcut and Its Shortcomings</h1>
<p>So I asked friends who do this kind of thing for a living how they handle the $n$-dimensional case. A couple of them suggested taking a shortcut around it!</p>
<blockquote>
<p>Look, you’ve already handled the two-dimensional case. And that’s just an instance of higher dimensional cases.</p>
<p>Take a probability assignment like (2/3, 1/3). We can also think of it as (2/3, 1/3, 0), or as (2/3, 0, 1/3, 0), etc.</p>
<p>No matter how many zeros we sprinkle around in there, the same thing is going to happen as in the two-dimensional case. Leaping to certainty in the 2/3 possibility will minimize expected inaccuracy. (Because possibilities with no probability make no difference to expected value calculations.)</p>
<p>So no matter how many dimensions we’re working in, there will always be <em>some</em> probability assignment where leaping to certainty minimizes expected inaccuracy. It just might have lots of zeros in it.</p>
<p>So Euclidean distance is, technically, improper in any finite number of dimensions.</p>
</blockquote>
<p>At first I thought that was good enough for philosophy. Though I still wanted to know how to handle “no zeros” cases for the mathematical clarity.</p>
<p>Then I realized there may be a philosophical reason to be dissatisfied with this shortcut. A lot of people endorse the <a href="http://philosophy.anu.edu.au/sites/default/files/Staying%20Regular.December%2028.2012.pdf" target="_blank">Regularity principle</a>: you should never assign zero probability to any possibility. For these people, the shortcut might be a dead end.</p>
<p>(Of course, maybe we shouldn’t embrace Regularity if we’re working in the accuracy framework. I won’t stop for that question here.)</p>
<h1 id="a-theorem-and-its-corollary">A Theorem and Its Corollary</h1>
<p>So let’s take the problem head on. We want to show that Euclidean distance is improper in $n > 2$ dimensions, even when there are “no zeros”. Two last bits of terminology:</p>
<ul>
<li>A probability assignment $(p_1, \ldots, p_n)$ is <em>regular</em> if $p_i > 0$ for all $i$.</li>
<li>A probability assignment $(p_1, \ldots, p_n)$ is <em>uniform</em> if $p_i = p_j$ for all $i,j$.</li>
</ul>
<p>So, for example, the assignment $(1/3, 1/3, 1/3)$ is both regular and uniform. Whereas the assignment $(2/5, 2/5, 1/5)$ is regular, but not uniform.</p>
<p>What we’ll show is that assignments like $(2/5, 2/5, 1/5)$ make Euclidean distance “unstable”: they expect some other assignment to do better, accuracy-wise. (Exactly which other assignment they’ll expect to do best isn’t always easy to say.)</p>
<p>(Though I try to keep the math in these posts as elementary as possible, this proof will use calculus. If you know a bit about derivatives, you should be fine. Technically we’ll use multi-variable calculus. But if you’ve worked with derivatives in single-variable calculus, that should be enough for the main ideas.)</p>
<p><strong>Theorem.</strong>
Let $\p = (p_1, \ldots, p_n)$ be a regular, non-uniform probability assignment. If accuracy is measured by Euclidean distance, then $EI_{\p}(\q)$ is not minimized when $\q = \p$.</p>
<p><em>Proof.</em>
Let $\p = (p_1, \ldots, p_n)$ be a regular and non-uniform probability assignment, and measure inaccuracy using Euclidean distance. Then:
$$
\begin{align}
EI_{\p}(\q) &= p_1 \sqrt{(q_1 - 1)^2 + \ldots + (q_n - 0)^2} + \ldots + p_n \sqrt{(q_1 - 0)^2 + \ldots + (q_n - 1)^2}\\<br />
&= p_1 \sqrt{(q_1 - 1)^2 + \ldots + q_n^2} + \ldots + p_n \sqrt{q_1^2 + \ldots + (q_n - 1)^2}
\end{align}
$$</p>
<p>The crux of our proof will be that the derivatives of this function are non-zero at the point $\q = \p$. Since the minimum of a function is always a <a href="https://en.wikipedia.org/wiki/Critical_point_(mathematics)" target="_blank">“critical point”</a>, that suffices to show that $\q = \p$ is not a minimum of $\EIpq$.</p>
<p>To start, we calculate the partial derivative of $\EIpq$ for an arbitrary $q_i$:
$$
\begin{align}
\frac{\partial}{\partial q_i} \EIpq
&=
\frac{\partial}{\partial q_i} \left( p_1 \sqrt{(q_1 - 1)^2 + \ldots + q_n^2} + \ldots + p_n \sqrt{q_1^2 + \ldots + (q_n - 1)^2} \right)\\<br />
&=
p_1 \frac{\partial}{\partial q_i} \sqrt{(q_1 - 1)^2 + \ldots + q_n^2} + \ldots + p_n \frac{\partial}{\partial q_i} \sqrt{q_1^2 + \ldots + (q_n - 1)^2}\\<br />
&= \quad
p_i \frac{q_i - 1}{\sqrt{(q_i - 1)^2 + \sum_{j \neq i} q_j^2}} + \sum_{j \neq i} p_j \frac{q_i}{\sqrt{(q_j - 1)^2 + \sum_{k \neq j} q_k^2}}\\<br />
&= \quad
\sum_{j \neq i} \frac{p_j q_i}{\sqrt{(q_j - 1)^2 + \sum_{k \neq j} q_k^2}} - \sum_{j \neq i} \frac{p_i q_j}{\sqrt{(q_i - 1)^2 + \sum_{j \neq i} q_j^2}}.
\end{align}
$$</p>
<p>Then we evaluate at $\q = \p$:
$$
\begin{align}
\frac{\partial}{\partial q_i} \EIpp
&= \sum_{j \neq i} \frac{p_i p_j}{\sqrt{(p_j - 1)^2 + \sum_{k \neq j} p_k^2}} - \sum_{j \neq i} \frac{p_i p_j}{\sqrt{(p_i - 1)^2 + \sum_{j \neq i} p_j^2}}
\end{align}
$$</p>
<p>Now, because $\p$ is not uniform, some of its elements are larger than others. And because it is finite, there is at least one largest element. When $p_i$ is one of these largest elements, then $\partial / \partial q_i \EIpp$ is negative.</p>
<p>Why?</p>
<p>In our equation for $\partial / \partial q_i \EIpp$, each positive term has a corresponding negative term whose numerator is identical. And when $p_i$ is a largest element of $\p$, the denominator of each negative term will never be larger, but will sometimes be smaller, than the denominator of its corresponding positive term. Subtracting $1$ from $p_i$ before squaring does more to reduce the sum of squares $p_i^2 + \sum_{j \neq i} p_j^2$ than subtracting $1$ from any smaller term would. It effectively removes the/a largest square from the sum and substitutes the smallest replacement. So the negative terms are never smaller, but are sometimes larger, than their positive counterparts.</p>
<p>If, on the other hand, $p_i$ is the one of the smallest elements, then $\partial / \partial q_i \EIpp$ is positive. For then the reverse argument applies: the denominator of each negative term will never be smaller and will sometimes be larger than the denominator of the corresponding positive term. So the negatives terms are never larger, but are sometimes smaller, than their positive counterparts.</p>
<p>We have shown that the partial derivates of $\EIpq$ are non-zero at the point $\q = \p$. Thus $\p$ is not a critical point of $\EIpq$, and hence cannot be a minimum of $\EIpq$. <span class="floatright">$\Box$</span></p>
<p><strong>Corollary.</strong> Euclidean distance is improper in any finite number of dimensions.</p>
<p><em>Proof.</em> This is just a slight restatement of our theorem. If $\q = \p$ is not a minimum of $\EIpq$, then there is some $\q \neq \p$ such that $\EIpq < \EIpp$. <span class="floatright">$\Box$</span></p>
<h1 id="conjectures-awaiting-refutations">Conjectures Awaiting Refutations</h1>
<p>Notice, we’ve also shown something a bit stronger. We showed that the slope of $\EIpq$ at the point $\q = \p$ is always negative in the direction of $\p$’s largest element(s), and positive in the direction of its smallest element(s). That means we can always reduce expected inaccuracy by taking some small quantity away from the/a smallest element of $\p$ and adding it to the/a largest element. In other words, we can always reduce expected inaccuracy by moving <em>some</em> way towards perfect certainty in the/a possibility that $\p$ rates most probable.</p>
<p>However, we <em>haven’t</em> shown that repeatedly minimizing expected inaccuracy will, eventually, lead to certainty in the/a possibility that was most probable to begin with. For one thing, we haven’t shown that moving towards certainty in this direction minimizes expected inaccuracy at each step. We’ve only shown that moving in this direction reduces it.</p>
<p>Still, I’m pretty sure a result along these lines holds. Tinkering in <em>Mathematica</em> strongly suggests that the following Conjectures are true in any finite number of dimensions $n$:</p>
<p><strong>Conjecture 2.</strong> If a probability assignment gives greater than $1/ 2$ probability to some possibility, then expected inaccuracy is minimized by assigning probability 1 to that possibility. (But see the <strong>Update</strong> below.)</p>
<p><strong>Conjecture 3.</strong> Given a non-uniform probability assignment, repeatedly minimizing expected inaccuracy will, within a finite number of steps, increase the probability of the/a possibility that was most probable initially beyond $1/ 2$.</p>
<p>If these conjectures hold, then there’s still a weak-ish sense in which Euclidean distance is “extremizing” in $n > 2$ dimensions. Given a non-uniform probability assignment, repeatedly minimizing expected inaccuracy will eventually lead to greater than $1/ 2$ probability in the/a possibility that was most probable to begin with. Then, minimizing inaccuracy will lead in a single step to certainty in that possibility.</p>
<p>Proving these conjectures would close much of the gap between the theorem we proved and the false conjecture I started with. If you’re interested, you can use <a href="https://github.com/jweisber/a4d/blob/master/Euclid%20in%20the%20Round.nb" target="_blank">this <em>Mathematica</em> notebook</a> to test them.</p>
<p><strong>Update: Mar. 6, 2017.</strong> Thanks to some excellent help from <a href="https://mathematics.stanford.edu/people/department-directory/name/jonathan-love/" target="_blank">Jonathan Love</a>, I’ve tweaked this post (and greatly simplified <a href="http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%203/">the previous one</a>).</p>
<p>I changed the counterexample to the false Conjecture 1, which used to be $\p = (3/7, 2/7, 2/7)$ and $\p’ = (4/7, 2/7, 1/7)$. That works fine, but it’s potentially misleading.</p>
<p>As Jonathan kindly pointed out, the minimum point then is something quite nice. It’s obtained by moving in the $x$-dimension from $3/7$ to $\sqrt{3/7}$, and correspondingly reducing the probability in the $y$ and $z$ dimensions in equal parts.</p>
<p>But, in general, moving to the square root of the largest $p_i$ (when there is one) doesn’t minimize $\EIpq$. Even in the special case where all the other elements in the vector are equal, this doesn’t generally work.</p>
<p>Jonathan did solve that special case, though, and he found at least one interesting result connected with Conjecture 2. There appear to be cases where $p_i < 1/ 2$ for all $i$, and yet $\EIpq$ is still minimized by going directly to the extreme. For example, $\p = (.465, .2675, .2675)$.</p>
Editorial Gravity
http://jonathanweisberg.org/post/Editorial%20Gravity/
Wed, 22 Feb 2017 10:44:10 -0500http://jonathanweisberg.org/post/Editorial%20Gravity/
<p>We’ve all been there. One referee is positive, the other negative, and the editor decides to reject the submission.</p>
<p>I’ve heard it said editors tend to be conservative given the recommendations of their referees. And that jibes with my experience as an author.</p>
<p>So is there anything to it—is “editorial gravity” a real thing? And if it is, how strong is its pull? Is there some magic function editors use to compute their decision based on the referees’ recommendations?</p>
<p>In this post I’ll consider how things shake out at <a href="http://www.ergophiljournal.org/" target="_blank"><em>Ergo</em></a>.</p>
<h1 id="decision-rules">Decision Rules</h1>
<p><em>Ergo</em> doesn’t have any rule about what an editor’s decision should be given the referees’ recommendations. In fact, we explicitly discourage our editors from relying on any such heuristic. Instead we encourage them to rely on their judgment about the submission’s merits, informed by the substance of the referees’ reports.</p>
<p>Still, maybe there’s some natural law of journal editing waiting to be discovered here, or some unwritten rule.</p>
<p>Referees choose from four possible recommendations at <em>Ergo</em>: Reject, Major Revisions, Minor Revisions, or Accept. Let’s consider four simple rules we might use to predict an editor’s decision, given the recommendations of their referees.</p>
<ol>
<li>Max: the editor follows the recommendation of the most positive referee. (Ha!)</li>
<li>Mean: the editor “splits the difference” between the referees’ recommendations.
<ul>
<li>Accept + Major Revisions → Minor Revisions, for example.</li>
<li>When the difference is intermediate between possible decisions, we’ll stipulate that this rule “rounds down”.
<ul>
<li>Major Revisions + Minor Revisions → Major Revisions, for example.</li>
</ul></li>
</ul></li>
<li>Min: the editor follows the recommendation of the most negative referee.</li>
<li>Less-than-Min: the editor’s decision is a step more negative than either of the referees’.
<ul>
<li>Major Revisions + Minor Revisions → Reject, for example.</li>
<li>Except obviously that Reject + anything → Reject.</li>
</ul></li>
</ol>
<p>Do any of these rules do a decent job of predicting editorial decisions? If so, which does best?</p>
<h1 id="a-test">A Test</h1>
<p>Let’s run the simplest test possible. We’ll go through the externally reviewed submissions in <em>Ergo</em>’s database and see how often each rule makes the correct prediction.</p>
<p><img src="http://jonathanweisberg.org/img/editorial_gravity_files/unnamed-chunk-2-1.png" alt="" /></p>
<p>Not only was Min the most accurate rule, its predictions were correct 85% of the time! (The sample size here is 233 submissions, by the way.) Apparently, editorial gravity is a real thing, at least at <em>Ergo</em>.</p>
<p>Of course, <em>Ergo</em> might be atypical here. It’s a new journal, and online-only with no regular publication schedule. So there’s some pressure to play it safe, and no incentive to accept papers in order to fill space.</p>
<p>But let’s suppose for a moment that <em>Ergo</em> is typical as far as editorial gravity goes. That raises some questions. Here are two.</p>
<h1 id="two-questions">Two Questions</h1>
<p>First question: can we improve on the Min rule? Is there a not-too-complicated heuristic that’s even more accurate?</p>
<p>Visualizing our data might help us spot any patterns. Typically there are two referees, so we can plot most submissions on a plane according to the referees’ recommendations. Then we can colour them according to the editor’s decision. Adding a little random jitter to make all the points visible:</p>
<p><img src="http://jonathanweisberg.org/img/editorial_gravity_files/unnamed-chunk-3-1.png" alt="" /></p>
<p>To my eye this looks a lot like the pattern of concentric-corners you’d expect from the Min rule. Though not exactly, especially when the two referees strongly disagree—the top-left and bottom-right corners of the plot. Still, other than treating cases of strong disagreement as a tossup, no simple way of improving on the Min rule jumps out at me.</p>
<p>Second question: if editorial gravity is a thing, is it a good thing or a bad thing?</p>
<p>I’ll leave that as an exercise for the reader.</p>
<h1 id="technical-note">Technical Note</h1>
<p>This post was written in R Markdown and the source code is <a href="https://github.com/jweisber/rgo/blob/master/editorial gravity/editorial gravity.Rmd" target="_blank">available on GitHub</a>.</p>
Gender & Journal Referees
http://jonathanweisberg.org/post/Referee%20Gender/
Mon, 20 Feb 2017 09:34:10 -0500http://jonathanweisberg.org/post/Referee%20Gender/
<p>We looked at author gender in <a href="http://jonathanweisberg.org/post/Author Gender/">a previous post</a>, today let’s consider referees. Does their gender have any predictive value?</p>
<p>Once again our discussion only covers men and women because we don’t have the data to support a deeper analysis.<sup class="footnote-ref" id="fnref:1"><a rel="footnote" href="#fn:1">1</a></sup></p>
<p>Using data from <a href="http://www.ergophiljournal.org/" target="_blank"><em>Ergo</em></a>, we’ll consider the following questions:</p>
<ol>
<li><em>Requests</em>. How are requests to referee distributed between men and women? Are men more likely to be invited, for example?</li>
<li><em>Responses</em>. Does gender inform a referee’s response to a request? Are women more likely to say ‘yes’, for example?</li>
<li><em>Response-speed</em>. Does gender inform how quickly a referee responds to an invitation (whether to agree or to decline)? Do men take longer to agree/decline an invitation, for example?</li>
<li><em>Completion-speed</em>. If a referee does agree to provide a report, does their gender inform how quickly they’ll complete that report? Do men and women tend to complete their reports in the same time-frame?</li>
<li><em>Recommendations</em>. Does gender inform how positive/negative a referee’s recommendation is? Are men and women equally likely to recommend that a submission be rejected, for example?</li>
<li><em>Influence</em>. Does a referee’s gender affect the influence of their recommendation on the editor’s decison? Are the recommendations of male referees more likely to be followed, for example?</li>
</ol>
<p>A quick overview of our data set: there are a total of 1526 referee-requests in <em>Ergo</em>’s database. But only 1394 are included in this analysis. I’ve excluded:</p>
<ol>
<li>Requests to review an invited resubmission, since these are a different sort of beast.</li>
<li>Pending requests and reports, since the data for these are incomplete.</li>
<li>A handfull of cases where the referee’s gender is either unknown, or doesn’t fit the male/female classification.</li>
</ol>
<h1 id="requests">Requests</h1>
<p>How are requests distributed between men and women? 322 of our 1394 requests went to women, or 23.1% (1072 went to men, or 76.9%).</p>
<p>How does this compare to the way men and women are represented in academic philosophy in general? Different sources and different subpopulations yield a range of estimates.</p>
<p>At the low end, we saw in <a href="http://jonathanweisberg.org/post/Author Gender/">an earlier post</a> that about 15.3% of <em>Ergo</em>’s submissions come from women. The PhilPapers survey yields a range from 16.2% (<a href="https://philpapers.org/surveys/demographics.pl" target="_blank">all respondents</a>) to 18.4% (<a href="https://philpapers.org/surveys/demographics.pl?affil=Target+faculty&survey=8" target="_blank">“target” faculty</a>). And sources cited in <a href="http://www.faculty.ucr.edu/~eschwitz/SchwitzPapers/WomenInPhil-160315b.pdf" target="_blank">Schwitzgebel & Jennings</a> estimate the percentage of women faculty in various English speaking countries at 23% for Australia, 24% for the U.K., and 19–26% for the U.S.</p>
<p>So we have a range of baseline estimates from 15% to 26%. For comparison, the 95% confidence interval around our 23.1% finding is (21%, 25.4%).</p>
<h1 id="responses">Responses</h1>
<p>Do men and women differ in their responses to these requests? Here are the raw numbers:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Agreed</th>
<th align="right">Declined / No Response / Canceled</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">101</td>
<td align="right">221</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">403</td>
<td align="right">669</td>
</tr>
</tbody>
</table>
<p>The final column calls for some explanation. I’m lumping togther several scenarios here: (i) the referee responds to decline the request, (ii) the referee never responds, (iii) the editors cancel the request because it was made in error. Unfortunately, these three scenarios are hard to distinguish based on the raw data. For example, sometimes a referee declines by email rather than via our online system, and the handling editor then cancels the request instead of marking it as “Declined”.</p>
<p>With that in mind, here are the proportions graphically:</p>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-6-1.png" alt="" /></p>
<p>Men agreed more often than women: approximately 38% vs. 31%. And this difference is statistically significant.<sup class="footnote-ref" id="fnref:0"><a rel="footnote" href="#fn:0">2</a></sup></p>
<p>Note that women and men accounted for about 20% and 80% of the “Agreed” responses, respectively. Whether this figure differs significantly from the gender makeup of “the general population” depends, as before, on the source and subpopulation we use for that estimate.</p>
<p>We saw that estimates of female representation ranged from roughly 15% to 26%. For comparison, the 95% confidence interval around our 20% finding is (16.8%, 23.8%).</p>
<h1 id="response-speed">Response-speed</h1>
<p>Do men and women differ in response-speed—in how quickly they respond to a referee request (whether to agree or to decline)?</p>
<p>The average response-time for women is 1.92 days, and for men it’s 1.58 days. This difference is not statistically significant.<sup class="footnote-ref" id="fnref:3"><a rel="footnote" href="#fn:3">3</a></sup></p>
<p>A boxplot likewise suggests that men and women have similar interquartile ranges:</p>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-9-1.png" alt="" /><!-- --></p>
<h1 id="completion-speed">Completion-speed</h1>
<p>What about completion-speed: is there any difference in how long men and women take to complete their reports?</p>
<p>Women took 27.6 days on average, while men took 23.8 days. This difference is statistically significant.<sup class="footnote-ref" id="fnref:4"><a rel="footnote" href="#fn:4">4</a></sup></p>
<p>Does that mean men are more likely to complete their reports on time? Not necessarily. Here’s a frequency polygram showing when reports were completed:</p>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-11-1.png" alt="" /><!-- --></p>
<p>The spike at the four-week mark corresponds to the standard due date. We ask referees to submit their reports within 28 days of the initial request.</p>
<p>It looks like men had a stronger tendency to complete their reports early. But were they more likely to complete them on time?</p>
<p>One way to tackle this question is to look at how completed reports accumulate with time (the <a href="https://en.wikipedia.org/wiki/Empirical_distribution_function" target="_blank">empirical cumulative distribution</a>):</p>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-12-1.png" alt="" /><!-- --></p>
<p>As expected, the plot shows that men completed their reports early with greater frequency. But it also looks like women and men converged around the four-week mark, when reports were due.</p>
<p>Another way of approaching the question is to classify reports as either “On Time” or “Late”, according to whether they were completed before Day 29.</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">On Time</th>
<th align="right">Late</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">50</td>
<td align="right">38</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">242</td>
<td align="right">121</td>
</tr>
</tbody>
</table>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-14-1.png" alt="" /><!-- --></p>
<p>A chi-square test of independence then finds no statistically significant difference.<sup class="footnote-ref" id="fnref:6"><a rel="footnote" href="#fn:6">5</a></sup></p>
<p>Apparently men and women differed in their tendency to be early, but not necessarily in their tendency to be on time.</p>
<h1 id="recommendations">Recommendations</h1>
<p>Did male and female referees differ in their recommendations to the editors?</p>
<p><em>Ergo</em> offers referees four recommendations to choose from. The raw numbers:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Reject</th>
<th align="right">Major Revisions</th>
<th align="right">Minor Revisions</th>
<th align="right">Accept</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">42</td>
<td align="right">29</td>
<td align="right">9</td>
<td align="right">8</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">154</td>
<td align="right">103</td>
<td align="right">61</td>
<td align="right">45</td>
</tr>
</tbody>
</table>
<p>In terms of frequencies:</p>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-16-1.png" alt="" /><!-- --></p>
<p>The differences here are not statistically significant according to a chi-square test of independence.<sup class="footnote-ref" id="fnref:5"><a rel="footnote" href="#fn:5">6</a></sup></p>
<h1 id="influence">Influence</h1>
<p>Does a referee’s gender affect whether the editor follows their recommendation? We can tackle this question a few different ways.</p>
<p>One way is to just tally up those cases where the editor’s decision was the same as the referee’s recommendation, and those where it was different.</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Same</th>
<th align="right">Different</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">51</td>
<td align="right">37</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">206</td>
<td align="right">157</td>
</tr>
</tbody>
</table>
<p><img src="http://jonathanweisberg.org/img/referee_gender_files/unnamed-chunk-17-1.png" alt="" /><!-- --></p>
<p>Clearly there’s no statistically significant difference between male and female referees here.<sup class="footnote-ref" id="fnref:7"><a rel="footnote" href="#fn:7">7</a></sup></p>
<p>A second approach would be to assign numerical ranks to referees’ recommendations and editors’ decisions: Reject = 1, Major Revisions = 2, etc. Then we can consider how far the editor’s decision is from the referee’s recommendation. For example, a decision of Accept is 3 away from a recommendation of Reject, while a decision of Major Revisions is 2 away from a recommendation of Accept.</p>
<p>By this measure, the average distance between the referee’s recommendation and the editor’s decision was 0.57 for women and 0.56 for men—clearly not a statistically significant difference.<sup class="footnote-ref" id="fnref:8"><a rel="footnote" href="#fn:8">8</a></sup></p>
<h1 id="summary">Summary</h1>
<p>Men received more requests to referee than women, as expected given the well known gender imbalance in academic philosophy. The distribution of requests between men (76.9%) and women (23.1%) was in line with some estimates of the gender makeup of academic philosophy, though not all estimates.</p>
<p>Men were more likely to agree to a request (38% vs. 31%), a statistically significant difference. Women accounted for about 20% of the “Agreed” responses, however, consistent with most (but not all) estimates of the gender makeup of academic philosophy.</p>
<p>There was no statistically significant difference in response-speed, but there was in the speed with which reports were completed (23.8 days on average for men, 27.6 days for women). This difference appears to be due to a stronger tendency on the part of men to complete their reports early, though not necessarily a greater chance of meeting the deadline.</p>
<p>Finally, there was no statistically significant difference in the recommendations of male and female referees, or in editors’ uptake of those recommendations.</p>
<h1 id="technical-notes">Technical Notes</h1>
<p>This post was written in R Markdown and the source is <a href="https://github.com/jweisber/rgo/blob/master/referee%20gender/referee%20gender.Rmd" target="_blank">available on GitHub</a>. I’m new to both R and classical statistics, and this post is a learning exercise for me. So I encourage you to check the code and contact me with corrections.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">Unlike in the previous analysis of author gender, however, here we do have a few known cases where either (i) the referee identifies as neither male nor female, or (ii) they identify as something more specific, e.g. “transgender male” rather than just “male”. But these cases are still too few for statistical analysis.
<a class="footnote-return" href="#fnref:1"><sup>[return]</sup></a></li>
<li id="fn:0">$\chi^2$(1, <em>N</em> = 1394) = 3.89, <em>p</em> = 0.05.
<a class="footnote-return" href="#fnref:0"><sup>[return]</sup></a></li>
<li id="fn:3"><em>t</em>(437.43) = -1.63, <em>p</em> = 0.1
<a class="footnote-return" href="#fnref:3"><sup>[return]</sup></a></li>
<li id="fn:4"><em>t</em>(144.26) = -2.46, <em>p</em> = 0.02
<a class="footnote-return" href="#fnref:4"><sup>[return]</sup></a></li>
<li id="fn:6">$\chi^2$(1, <em>N</em> = 451) = 2.59, <em>p</em> = 0.11.
<a class="footnote-return" href="#fnref:6"><sup>[return]</sup></a></li>
<li id="fn:5">$\chi^2$(3, <em>N</em> = 451) = 3.6, <em>p</em> = 0.31.
<a class="footnote-return" href="#fnref:5"><sup>[return]</sup></a></li>
<li id="fn:7">$\chi^2$(1, <em>N</em> = 451) = 0.01, <em>p</em> = 0.93.
<a class="footnote-return" href="#fnref:7"><sup>[return]</sup></a></li>
<li id="fn:8"><em>t</em>(117.57) = 0.07, <em>p</em> = 0.95.
<a class="footnote-return" href="#fnref:8"><sup>[return]</sup></a></li>
</ol>
</div>
In Defense of Reviewer 2
http://jonathanweisberg.org/post/Reviewer%202/
Mon, 06 Feb 2017 10:36:10 -0500http://jonathanweisberg.org/post/Reviewer%202/
<p>Spare a thought for Reviewer 2, that much-maligned shade of academe. There’s even <a href="https://twitter.com/hashtag/reviewer2" target="_blank">a hashtag</a> dedicated to the joke:</p>
<p><blockquote class="twitter-tweet tw-align-center" data-lang="en"><p lang="en" dir="ltr">A rare glimpse of reviewer 2, seen here in their natural habitat <a href="https://t.co/lpT1BVhDCX">pic.twitter.com/lpT1BVhDCX</a></p>— Aidan McGlynn (@AidanMcGlynn) <a href="https://twitter.com/AidanMcGlynn/status/820647829446283264">January 15, 2017</a></blockquote>
<script async src="http://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>But is it just a joke? Order could easily matter here.</p>
<p>Referees invited later weren’t the editor’s first choice, after all. Maybe they’re less competent, less likely to appreciate your brilliant insights as an author. Or maybe they’re more likely to miss well-disguised flaws! Then we should expect Reviewer 2 to be the more <em>generous</em> one.</p>
<p>Come to think of it, we can order referees in other ways beside order-of-invite. We might order them according to who completes their report fastest, for example. And faster referees might be more careless, hence more dismissive. Or they might be less critical and thus more generous.</p>
<p>There’s a lot to consider. Let’s investigate, using <a href="http://www.ergophiljournal.org/" target="_blank"><em>Ergo</em></a>’s data, <a href="http://jonathanweisberg.org/tags/rgo/">as usual</a>.</p>
<h1 id="severity-generosity">Severity & Generosity</h1>
<p>Reviewer 2 is accused of a lot. It’s not just that their overall take is more severe; they also tend to miss the point. They’re irresponsible and superficial in their reading. And to the extent they do appreciate the author’s point, their objections are poorly thought out. What’s more, if they bother to demand revisions, their demands are unreasonable.</p>
<p>We can’t measure these things directly, of course. But we can estimate a referee’s generosity indirectly, using their recommendation to the editors as a proxy.</p>
<p><em>Ergo</em>’s referees choose from four possible recommendations: Reject, Major Revisions, Minor Revisions, and Accept. To estimate a referee’s generosity, we’ll assign these recommendations numerical ranks, from 1 (Reject) up through 4 (Accept).</p>
<p>The higher this number, the more generous the referee; the lower, the more severe.</p>
<h1 id="invite-order">Invite Order</h1>
<p>Is there any connection between the order in which referees are invited and their severity?</p>
<p>Usually an editor has to try a few people before they get two takers. So we can assign each potential referee an “invite rank”. The first person asked has rank 1, the second person asked has rank 2, and so on.</p>
<p>Is there a correlation between invite rank and severity?</p>
<p>Here’s a plot of invite rank (<em>x</em>-axis) and generosity (<em>y</em>-axis). (The points have non-integer heights because I’ve added some random <a href="http://r4ds.had.co.nz/data-visualisation.html#position-adjustments" target="_blank">“jitter”</a> to make them all visible. Otherwise you’d just see an uninformative grid.)</p>
<p><img src="http://jonathanweisberg.org/img/reviewer_2_files/unnamed-chunk-2-1.png" alt="" /></p>
<p>The blue curve shows the overall trend in the data.<sup class="footnote-ref" id="fnref:1"><a rel="footnote" href="#fn:1">1</a></sup> It’s basically flat all the way through, except at the far-right end where the data is too sparse to be informative.</p>
<p>We can also look at the classic measure of correlation known as <a href="https://en.wikipedia.org/wiki/Spearman's_rank_correlation_coefficient" target="_blank">Spearman’s rho</a>. The estimate is essentially 0 given our data ($r_s$ = 0.01).<sup class="footnote-ref" id="fnref:2"><a rel="footnote" href="#fn:2">2</a></sup></p>
<p>Evidently, invite-rank has no discernible impact on severity.</p>
<h1 id="speed">Speed</h1>
<p>But now let’s look at the speed with which a referee completes their report:</p>
<p><img src="http://jonathanweisberg.org/img/reviewer_2_files/unnamed-chunk-4-1.png" alt="" /></p>
<p>Here an upward trend is discernible. And our estimate of Spearman’s rho agrees: $r_s$ = 0.1, a small but non-trivial correlation.<sup class="footnote-ref" id="fnref:3"><a rel="footnote" href="#fn:3">3</a></sup></p>
<p>Apparently, referees who take longer tend to be more generous!</p>
<h1 id="my-take">My Take</h1>
<p>I find these results encouraging, for the most part.</p>
<p>It’s nice to know that an editor’s first choice for a referee is the same as their fifth, as far as how severe or generous they’re likely to be.</p>
<p>It’s also nice to know that the speed with which a referee completes their report doesn’t <em>hugely</em> inform heir severity.</p>
<p>One we might well worry that faster referees are unduly severe. But this worry is tempered by a few considerations.</p>
<p>For one thing, the effect we found is small enough that it could just be noise. It is detectable using tools like regression and significance testing, so it’s not to be dismissed out of hand. But we might also do well to heed the wisdom of <a href="https://xkcd.com/1725/" target="_blank">XKCD</a> here:</p>
<p><img src="https://imgs.xkcd.com/comics/linear_regression_2x.png" alt="" /></p>
<p>Even if the effect is real, though, it could be a good thing just as easily as a bad thing.</p>
<p>True, referees who work fast might be sloppy and dismissive. And those who take longer might feel guiltier and thus be unduly generous.</p>
<p>But maybe referees who are more on the ball are both more prompt and more apt to spot a submission’s flaws. Or (as my coeditor Franz Huber pointed out) manuscripts that should clearly be rejected might be easier to referee on average, hence faster.</p>
<p>It’s hard to know what to make of this effect, if it is an effect. Clearly, <a href="https://twitter.com/hashtag/moreresearchisneeded" target="_blank">#MoreResearchIsNeeded</a>.</p>
<h1 id="technical-notes">Technical Notes</h1>
<p>This post was written in R Markdown and the source is <a href="https://github.com/jweisber/rgo/blob/master/reviewer%202/reviewer%202.Rmd" target="_blank">available on GitHub</a>. I’m new to both R and statistics, and this post is a learning exercise for me. So I encourage you to check the code and contact me with corrections.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">Specifically, the blue curve is a regression curve using the <a href="https://en.wikipedia.org/wiki/Local_regression#Definition_of_a_LOESS_model" target="_blank">LOESS</a> method of fit.
<a class="footnote-return" href="#fnref:1"><sup>[return]</sup></a></li>
<li id="fn:2">A significance test of the null hypothesis $\rho_s$ = 0 yields <em>p</em> = 0.87.
<a class="footnote-return" href="#fnref:2"><sup>[return]</sup></a></li>
<li id="fn:3">Testing the null hypothesis $\rho_s$ = 0 yields <em>p</em> = 0.03.
<a class="footnote-return" href="#fnref:3"><sup>[return]</sup></a></li>
</ol>
</div>
http://jonathanweisberg.org/2/
Wed, 01 Feb 2017 10:44:10 -0500http://jonathanweisberg.org/2/<ul>
<li>Three Lectures in Formal Epistemology, NIP 2010: 1) <a href="http://jonathanweisberg.org/pdf/NIP - ULPs.pdf">Upper & Lower Probabilities</a>, 2) <a href="http://jonathanweisberg.org/pdf/NIP - DST.pdf">Dempster-Shafer Theory</a>, and 3) <a href="http://jonathanweisberg.org/pdf/NIP - Pollock.pdf">Pollock’s Theory of Defeasible Reasoning</a></li>
<li><a href="http://jonathanweisberg.org/pdf/C_R_and_SKvWS.pdf">Conditionalization Without Reflection</a> — An extended version of <a href="http://jonathanweisberg.org/pdf/C_R_and_SKv2.SP.pdf">Conditionalization, Reflection, and Self-Knowledge</a></li>
</ul>
Accuracy for Dummies, Part 3: Beyond the Second Dimension
http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%203/
Fri, 27 Jan 2017 00:00:00 -0500http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%203/
<p>Last time we saw why accuracy-mavens prefer Brier distance to Euclidean distance. But we did everything in two dimensions. That’s fine for a coin toss, with only two possibilities. But what if there are three doors and one of them has a prize behind it??</p>
<p>Don’t panic! Today we’re going to verify that Brier distance is still a proper way of measuring inaccuracy, even when there are more than two possibilities. (Next time we’ll talk about Euclidean distance with more than two possibilitie.)</p>
<p>Let’s start small, with just three possibilities. $\renewcommand{\vec}[1]{\mathbf{#1}}\newcommand{\p}{\vec{p}}\newcommand{\q}{\vec{q}}\newcommand{\v}{\vec{v}}\newcommand{\EIpq}{EI_{\p}(\q)}\newcommand{\EIpp}{EI_{\p}(\p)}$</p>
<h1 id="three-possibilities">Three Possibilities</h1>
<p>You’re on a game show; there are three doors; one has a prize behind it. The three possibilities are represented by the vertices $(1,0,0)$, $(0,1,0)$, and $(0,0,1)$:</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/Three Vertices.png" alt="" /></p>
<p>Your credences are given by some probability assignment $(p_1, p_2, p_3)$. It might be $(1/ 3, 1/ 3, 1/ 3)$ but it could be anything… $(7/ 10, 2/ 10, 1/ 10)$, for example.</p>
<p>In case you’re curious, here’s what the range of possible probability assignments looks like in graphical terms:</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/Three Vertices with Hull.png" alt="" /></p>
<p>The triangular surface is the three-dimensional analogue of the diagonal line in <a href="http://jonathanweisberg.org/img/accuracy/2D Dominance Diagram.png">the two-dimensional diagram</a> from our <a href="http://jonathanweisberg.org/post/Accuracy for Dummies - Part 1/">first post</a> in this series.</p>
<p>It’ll be handy to refer to points on this surface using single letters, like $\p$ for $(p_1, p_2, p_3)$. We’ll write these letters in bold, to distinguish a sequence of numbers like $\p$ from a single number like $p_1$. (In math-speak, $\p$ is a <em>vector</em> and $p_1$ is a <em>scalar</em>.)</p>
<p>Our job is to show that Brier distance is “proper” in three dimensions. Let’s recall what that means: given a point $\p$, the expected Brier distance (according to $\p$) of a point $\q = (q_1, q_2, q_3)$ from the three vertices is always smallest when $\q = \p$.</p>
<p>What does <em>that</em> mean?</p>
<p>Recall, the Brier distance from $\q$ to the vertex $(1, 0, 0)$ is:
$$
(q_1 - 1)^2 + (q_2 - 0)^2 + (q_3 - 0)^2
$$
Or, more succinctly:
$$
(q_1 - 1)^2 + q_2^2 + q_3^2
$$
So the <em>expected</em> Brier distance of $\q$ according to $\p$ weights each such sum by the probability $\p$ assigns to the corresponding vertex.
$$
\begin{align}
&\quad\quad p_1 \left( (q_1 - 1)^2 + q_2^2 + q_3^2 \right)\\<br />
&\quad + p_2 \left( q_1^2 + (q_2 - 1)^2 + q_3^2 \right)\\<br />
&\quad + p_3 \left( q_1^2 + q_2^2 + (q_3 - 1)^2 \right)
\end{align}
$$
We need to show that this quantity is smallest when $\q = \p$, i.e. when $q_1 = p_1$, $q_2 = p_2$, and $q_3 = p_3$.</p>
<h2 id="visualizing-expected-inaccuracy">Visualizing Expected Inaccuracy</h2>
<p>Let’s do some visualization. We’ll take a few examples of $\p$, and graph the expected inaccuracy of other possible points $\q$, using Brier distance to measure inaccuracy.</p>
<p>For example, suppose $\p = (1/ 3, 1/ 3, 1/ 3)$. Then the expected inaccuracy of each point $\q$ looks like this:</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/BrierEI3.png" alt="" /></p>
<p>The horizontal axes represent $q_1$ and $q_2$. The vertical axis represents expected inaccuracy.</p>
<p>Where’s $q_3$?? Not pictured! If we used all three visible dimensions for the elements of $\q$, we’d have nothing left to visualize expected inaccuracy. But $q_3$ is there implicitly. You can always get $q_3$ by calculating $1 - (q_1 + q_2)$, because $\q$ is a probability assignment. So we don’t actually need $q_3$ in the graph!</p>
<p>Now, the red dot is the lowest point on the surface: the smallest possible expected inaccuracy, according to $\p$. But where is that in terms of $q_1$ and $q_2$? Let’s look at the same graph from directly above:</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/BrierEI3-above.png" alt="" /></p>
<p>Hey! Looks like the red dot is located at $q_1 = 1/ 3$ and $q_2 = 1/ 3$, i.e. at $\q = (1/ 3, 1/ 3, 1/ 3)$. Also known as $\p$. So that’s promising: looks like expected inaccuracy is minimized when $\q = \p$, at least in this example.</p>
<p>Let’s do one more example, $\p = (6/ 10, 3/ 10, 1/ 10)$. Then the expected Brier distance of each point $\q$ looks like this:
<img src="http://jonathanweisberg.org/img/accuracy/BrierEI3-2.png" alt="" />
Or, taking the aerial view again:
<img src="http://jonathanweisberg.org/img/accuracy/BrierEI3-2above.png" alt="" />
Yep, looks like the red dot is located at $q_1 = 6/ 10$ and $q_2 = 3/ 10$, i.e. at $\q = (6/ 10, 3/ 10, 1/ 10)$, also known as $\p$. So, once again, it seems expected inaccuracy is minimized when $\q = \p$.</p>
<p>So let’s prove that that’s how it always is.</p>
<h2 id="a-proof">A Proof</h2>
<p>We’ll need a little notation: I’m going to write $\EIpq$ for the expected inaccuracy of point $\q$, according to $\p$.</p>
<p>Now recall our formula for expected inaccuracy:
$$
\begin{align}
\EIpq
&= \quad p_1 \left( (q_1 - 1)^2 + q_2^2 + q_3^2 \right)\\<br />
&\quad + p_2 \left( q_1^2 + (q_2 - 1)^2 + q_3^2 \right)\\<br />
&\quad + p_3 \left( q_1^2 + q_2^2 + (q_3 - 1)^2 \right).
\end{align}
$$
How do we find the point $\q$ that minimizes this mess?</p>
<p>Originally this post used some pretty tedious calculus. But thanks to a hot tip from <a href="https://mathematics.stanford.edu/people/department-directory/name/jonathan-love/" target="_blank">Jonathan Love</a>, we can get by just with algebra.</p>
<p>First we need to expand the squares in our big ugly sum:
$$
\begin{align}
\EIpq
&= \quad p_1 \left( q_1^2 - 2q_1 + 1 + q_2^2 + q_3^2 \right)\\<br />
&\quad + p_2 \left( q_1^2 + q_2^2 - 2q_2 + 1 + q_3^2 \right)\\<br />
&\quad + p_3 \left( q_1^2 + q_2^2 + q_3^2 - 2q_3 + 1 \right).
\end{align}
$$
Then we’ll gather some common terms and rearrange things:
$$
\begin{align}
\EIpq &= (p_1 + p_2 + p_3)\left(q_1^2 + q_2^2 + q_3^2 + 1 \right) - 2p_1q_1 - 2p_2q_2 - 2p_3q_3.\\<br />
\end{align}
$$
Since $p_1 + p_2 + p_3 = 1$, that simplifies to:
$$
\begin{align}
\EIpq &= q_1^2 + q_2^2 + q_3^2 + 1 - 2p_1q_1 - 2p_2q_2 - 2p_3q_3.\\<br />
\end{align}
$$</p>
<p>Now we’ll use <a href="https://mathematics.stanford.edu/people/department-directory/name/jonathan-love/" target="_blank">Jonathan</a>’s ingenious trick. We’re going to add $p_1^2 + p_2^2 + p_3^2 - 1$ to this expression, <em>which doesn’t change where the minimum occurs</em>. If you shift every point on a graph upwards by the same amount, the minimum is still in the same place. (Imagine everybody in the world grows by an inch overnight; the shortest person in the world is still the shortest, despite being an inch taller.)</p>
<p>Then, magically, we get an expression that factors into something tidy:
$$
\begin{align}
&\phantom{=}\phantom{=} p_1^2 + p_2^2 + p_3^2 + q_1^2 + q_2^2 + q_3^2 - 2p_1q_1 - 2p_2q_2 - 2p_3q_3\\<br />
&= (p_1 - q_1)^2 + (p_2 - q_2)^2 + (p_3 - q_3)^2.
\end{align}
$$
And not just tidy, but easy to minimize. It’s a sum of squares, and squares are never negative. So the smallest possible value is $0$, which occurs when all the squares are $0$, i.e. when $q_1 = p_1$, $q_2 = p_2$, and $q_3 = p_3$.</p>
<p>So, the minimum of $\EIpq$ occurs in the same place, namely when $\q = \p$!</p>
<h2 id="the-nth-dimension">The Nth Dimension</h2>
<p>Now we can use the same idea to generalize to any number of dimensions. Since the steps are essentially identical, I’ll keep it short and (I hope) sweet.</p>
<p><strong>Theorem.</strong>
Given a probability assignment $\p = (p_1, \ldots, p_n)$, if inaccuracy is measured using Brier distance, then $\EIpq$ is uniquely minimized when $\q = \p$.</p>
<p><em>Proof.</em>
Let $\p = (p_1, \ldots, p_n)$ be a probability assignment, and let $\EIpq$ be the expected inaccuracy according to $\p$ of probability assignment $\q = (q_1, \ldots, q_n)$, measured using Brier distance.</p>
<p>First we simplify our expression for $\EIpq$ using algebra:
$$
\begin{align}
\EIpq
&= \quad p_1 \left( (q_1 - 1)^2 + q_2^2 + \ldots + q_n^2 \right)\\<br />
&\quad + p_2 \left( q_1^2 + (q_2 - 1)^2 + \ldots + q_n^2 \right)\\<br />
&\quad\quad \vdots\\<br />
&\quad + p_n \left( q_1^2 + q_2^2 + \ldots + q_{n-1}^2 + (q_n - 1)^2 \right)\\<br />
&= (p_1 + \ldots + p_n)\left( q_1^2 + \ldots + q_n^2 + 1\right) - 2 p_1 q_1 - \ldots - 2 p_n q_n\\<br />
&= q_1^2 + \ldots + q_n^2 + 1 - 2 p_1 q_1 - \ldots - 2 p_n q_n.
\end{align}
$$
Now, because $p_1^2 + \ldots + p_n^2 - 1$ is a constant, adding it to $\EIpq$ doesn’t change where the minimum occurs. So we can minimize instead:
$$
\begin{align}
&\phantom{=}\phantom{=} p_1^2 + \ldots + p_n^2 + q_1^2 + \ldots + q_n^2 - 2 p_1 q_1 - \ldots - 2 p_n q_n\\<br />
&= (p_1 - q_1)^2 + \ldots + (p_n - q_n)^2.
\end{align}
$$
Being a sum of squares, the minimum value here cannot be less than $0$, which occurs when $\q = \p$. <span style="float: right;">$\Box$</span></p>
<h1 id="conclusion">Conclusion</h1>
<div class="text-center">
<object data="http://www.youtube.com/embed/MkmMxfCgewQ"
width="560" height="315" classboo="text-center"></object>
</div>
<p>So what did we learn? That Brier distance isn’t just “stable” in toy cases like a coin-toss. It’s also stable in toy cases with any finite number of outcomes.</p>
<p>No matter how many outcomes are under consideration, each probability assignment expects itself to do best at minimizing inaccuracy, if we use Brier distance to measure inaccuracy.</p>
<p>To go beyond toy cases, we’d have to extend this result to cases with infinite numbers of possibilities. And I haven’t even begun to think about how to do that.</p>
<p>Instead, next time we’ll look at what happens in $3+$ dimensions when we use Euclidean distance instead of Brier distance. And it’s actually kind of interesting! It turns out Euclidean distance is still improper in $3+$ dimensions, but not necessarily in the same way as in $2$ dimensions. More on that next time…</p>
Gender & Journal Submissions
http://jonathanweisberg.org/post/Author%20Gender/
Thu, 26 Jan 2017 10:36:10 -0500http://jonathanweisberg.org/post/Author%20Gender/
<p>Does an author’s gender affect the fate of their submission to an academic journal? It’s a big question, even if we restrict ourselves to philosophy journals.</p>
<p>But we can make a start by using <a href="http://www.ergophiljournal.org" target="_blank"><em>Ergo</em></a> as one data-point. I’ll examine two questions:</p>
<ul>
<li><p>Question 1: Does gender affect the decision rendered at <em>Ergo</em>? Are men more likely to have their papers accepted, for example?</p></li>
<li><p>Question 2: Does gender affect time-to-decision at <em>Ergo</em>? For example, do women have to wait longer on average for a decision?</p></li>
</ul>
<h1 id="background">Background</h1>
<p>Some important background and caveats before we begin:</p>
<ul>
<li><p>Our data set goes back to Feb. 11, 2015, when <em>Ergo</em> moved to its current online system for handling submissions. We do have records going back to Jun. 2013, when the journal launched. But integrating the data from the two systems is a programming hassle I haven’t faced up to yet.</p></li>
<li><p>We’ll exclude submissions that were withdrawn by the author before a decision could be rendered. Usually, when an author withdraws a submission, it’s so that they can resubmit a trivially-corrected manuscript five minutes later. So this data mostly just gets in the way.</p></li>
<li><p>We’ll also exclude submissions that were still under review as of Jan. 1, 2017, since the data there is incomplete.</p></li>
<li><p>The gender data we’ll be using was gathered manually by <em>Ergo</em>’s managing editors (me and Franz Huber). In most cases we didn’t know the author personally. So we did a quick google to see whether we could infer the author’s gender based on public information, like pronouns and/or pictures. When we weren’t confident that we could, we left their gender as “unknown”.</p></li>
<li><p>This analysis covers only men and women, because there haven’t yet been any cases where we could confidently infer that an author identified as another gender. And the “gender unknown” cases are too few for reliable statistical analysis.</p></li>
<li><p>Since we only have data for the gender of the submitting author, our analysis will overlook co-authors.</p></li>
</ul>
<p>With that in mind, a brief overview: our data set contains $696$ submissions over almost two years (Feb. 11, 2015 up to Jan. 1, 2017), but only $639$ of these are included in this analysis. The $52$ submissions that were in-progress as of Jan. 1, 2017, or were withdrawn by the author, have been excluded. Another $5$ cases where the author’s gender was unknown were also excluded.</p>
<h1 id="gender-decisions">Gender & Decisions</h1>
<p>Does an author’s gender affect the journal’s decision about whether their submission is accepted? We can slice this question a few different ways:</p>
<ol>
<li><p>Does gender affect the first-round decision to reject/accept/R&R?</p></li>
<li><p>Does gender affect the likelihood of desk-rejection, specifically?</p></li>
<li><p>Does gender affect the chance of converting an R&R into an accept?</p></li>
<li><p>Does gender affect the ultimate decision to accept/reject (whether via an intervening R&R or not)?</p></li>
</ol>
<p>The short answer to all these questions is: no, at least not in a statistically significant way. But there are some wrinkles. So let’s take each question in turn.</p>
<h2 id="first-round-decisions">First-Round Decisions</h2>
<p>Does gender affect the first-round decision to reject/accept/R&R?</p>
<p><em>Ergo</em> has two kinds of R&R, Major Revisions and Minor Revisions. Here are the raw numbers:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Reject</th>
<th align="right">Major Revisions</th>
<th align="right">Minor Revisions</th>
<th align="right">Accept</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">76</td>
<td align="right">10</td>
<td align="right">2</td>
<td align="right">0</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">438</td>
<td align="right">41</td>
<td align="right">15</td>
<td align="right">5</td>
</tr>
</tbody>
</table>
<p>Graphically, in terms of percentages:</p>
<p><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-4-1.png" alt="" /><!-- --></p>
<p>There are differences here, of course: women were asked to make major revisions more frequently than men, for example. And men received verdicts of minor revisions or outright acceptance more often than women.</p>
<p>Are these differences significant? They don’t look it from the bar graph. And a standard chi-square test agrees.<sup class="footnote-ref" id="fnref:1"><a rel="footnote" href="#fn:1">1</a></sup></p>
<h2 id="desk-rejections">Desk Rejections</h2>
<p>Things are a little more interesting if we separate out desk rejections from rejections-after-external-review. The raw numbers:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Desk Reject</th>
<th align="right">Non-desk Reject</th>
<th align="right">Major Revisions</th>
<th align="right">Minor Revisions</th>
<th align="right">Accept</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">61</td>
<td align="right">15</td>
<td align="right">10</td>
<td align="right">2</td>
<td align="right">0</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">311</td>
<td align="right">127</td>
<td align="right">41</td>
<td align="right">15</td>
<td align="right">5</td>
</tr>
</tbody>
</table>
<p>In terms of percentages for men and women:</p>
<p><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-9-1.png" alt="" /><!-- --></p>
<p>The differences here are more pronounced. For example, women had their submissions desk-rejected more frequently, a difference of about 8.5%.</p>
<p>But once again, the differences are not statistically significant according to the standard chi-square test.<sup class="footnote-ref" id="fnref:2"><a rel="footnote" href="#fn:2">2</a></sup></p>
<h2 id="ultimate-decisions">Ultimate Decisions</h2>
<p>What if we just consider a submission’s ultimate fate—whether it’s accepted or rejected in the end? Here the results are pretty clear:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Reject</th>
<th align="right">Accept</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">78</td>
<td align="right">5</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">450</td>
<td align="right">38</td>
</tr>
</tbody>
</table>
<p><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-13-1.png" alt="" /><!-- --></p>
<p>Pretty obviously there’s no significant difference, and a chi-square test agrees.<sup class="footnote-ref" id="fnref:3"><a rel="footnote" href="#fn:3">3</a></sup></p>
<h2 id="conversions">Conversions</h2>
<p>Our analysis so far suggests that men and women probably have about equal chance of converting an R&R into an accept. Looking at the numbers directly corroborates that thought:</p>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">Reject</th>
<th align="right">Accept</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Female</td>
<td align="right">2</td>
<td align="right">5</td>
</tr>
<tr>
<td align="left">Male</td>
<td align="right">12</td>
<td align="right">33</td>
</tr>
</tbody>
</table>
<p><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-16-1.png" alt="" /><!-- --></p>
<p>As before, a standard chi-square test agrees.<sup class="footnote-ref" id="fnref:4"><a rel="footnote" href="#fn:4">4</a></sup> Though, of course, the numbers here are small and shouldn’t be given too much weight.</p>
<h2 id="conclusion-so-far">Conclusion So Far</h2>
<p>None of the data so far yielded a significant difference between men and women. None even came particularly close (see the footnotes for the numerical details). So it seems the journal’s decisions are independent of gender, or nearly so.</p>
<h1 id="gender-time-to-decision">Gender & Time-to-Decision</h1>
<p>Authors don’t just care what decision is rendered, of course. They also care that decisions are made quickly. Can men and women expect similar wait-times?</p>
<p>The average time-to-decision is 23.3 days. But for men it’s 23.9 days while for women it’s only 19.6. This looks like a significant difference. And although it isn’t quite significant according to a standard $t$ test, it very nearly is.<sup class="footnote-ref" id="fnref:5"><a rel="footnote" href="#fn:5">5</a></sup></p>
<p>What might be going on here? Let’s look at the observed distributions for men and women:</p>
<p><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-19-1.png" alt="" /><!-- --></p>
<p>A striking difference is that there are so many more submissions from men than from women. But otherwise these distributions actually look quite similar. Each is a bimodal distribution with one peak for desk-rejections around one week, and another, smaller peak for externally reviewed submissions around six or seven weeks.</p>
<p>We noticed earlier that women had more desk-rejections by about 8.5%. And while that difference wasn’t statistically significant, it may still be what’s causing the almost-significant difference we see with time-to-decision (especially if men also have a few extra outliers, as seems to be the case).</p>
<p>To test this hypothesis, we can separate out desk-rejections and externally reviewed submissions. Graphically:</p>
<p><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-20-1.png" alt="" /><!-- --><img src="http://jonathanweisberg.org/img/author_gender_files/unnamed-chunk-20-2.png" alt="" /><!-- --></p>
<p>Aside from the raw numbers, the distributions for men and for women look very similar. And if we run separate $t$ tests for desk-rejections and for externally reviewed submissions, gender differences are no longer close to significance. For desk-rejections $p = 0.24$. And for externally reviewed submissions $p = 0.46$.</p>
<h1 id="conclusions">Conclusions</h1>
<p>Apparently an author’s gender has little or no effect on the content or speed of <em>Ergo</em>’s decision. I’d <em>like</em> to think this is a result of the journal’s <a href="http://www.ergophiljournal.org/review.html" target="_blank">strong commitment to triple-anonymous review</a>. But without data from other journals to make comparisons, we can’t really infer much about potential causes. And, of course, we can’t generalize to other journals with any confidence, either.</p>
<h1 id="technical-notes">Technical Notes</h1>
<p>This post was written in R Markdown and the source is <a href="https://github.com/jweisber/rgo/blob/master/author%20gender/author%20gender.Rmd" target="_blank">available on GitHub</a>. I’m new to both R and classical statistics, and this post is a learning exercise for me. So I encourage you to check the code and contact me with corrections.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1"><p>Specifically, $\chi^2(3, N = 587) = 1.89$, $p = 0.6$. This raises the question of power, and for a small effect size ($w = .1$) power is only about $0.51$. But it increases quickly to $0.99$ at $w = .2$.</p>
<p>Given the small numbers in some of the columns though, especially the Accept column, we might prefer a different test than $\chi^2$. The more precise $G$ test yields $p = 0.46$, still fairly large. And Fisher’s exact test yields $p = 0.72$.</p>
<p>We might also do an ordinal analysis, since decisions have a natural desirability ordering for authors: Accept > Minor Revisions > Major Revisions > Reject. We can test for a linear trend by assigning integer ranks from 4 down through 1 <a href="http://ca.wiley.com/WileyCDA/WileyTitle/productCd-0470463635.html" target="_blank">(Agresti 2007)</a>. A test of the <a href="https://onlinecourses.science.psu.edu/stat504/node/91" target="_blank">Mantel-Haenszel statistic</a> $M^2$ then yields $p = 0.82$.</p>
<a class="footnote-return" href="#fnref:1"><sup>[return]</sup></a></li>
<li id="fn:2"><p>Here we have $\chi^2(4, N = 587) = 4.64$, $p = 0.33$. As before, the power for a small effect ($w = .1$) is only middling, about 0.46, but increases quickly to near certainty ($0.98$) by $w = .2$.</p>
<p>Instead of $\chi^2$ we might again consider a $G$ test, which yields $p = 0.24$, or Fisher’s exact test which yields $p = 0.37$.</p>
<p>For an ordinal test using the ranking Desk Reject < Non-desk Reject < Major Revisions < etc., the Mantel-Haenszel statistic $M^2$ now yields $p = 0.39$.</p>
<a class="footnote-return" href="#fnref:2"><sup>[return]</sup></a></li>
<li id="fn:3">Here we have $\chi^2(1, N = 571) = 0.11$, $p = 0.74$.
<a class="footnote-return" href="#fnref:3"><sup>[return]</sup></a></li>
<li id="fn:4">$\chi^2(1, N = 52) = 0$, $p = 1$.
<a class="footnote-return" href="#fnref:4"><sup>[return]</sup></a></li>
<li id="fn:5">Specifically, $t(137.71) = 1.78$, $p = 0.08$. Although a $t$ test may not actually be the best choice here, since (as we’re about to see) the sampling distributions aren’t normal, but rather bimodal. Still, we can compare this result to non-parametric tests like Wilcoxon-Mann-Whitney ($p = 0.1$) or the bootstrap-$t$ ($p = 0.07$). These $p$-values don’t quite cross the customary $\alpha = .05$ threshold either, but they are still small.
<a class="footnote-return" href="#fnref:5"><sup>[return]</sup></a></li>
</ol>
</div>
Accuracy for Dummies, Part 2: from Euclid to Brier
http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%202/
Wed, 18 Jan 2017 00:00:00 -0500http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%202/
<p><a href="http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%201/">Last time</a> we saw that Euclidean distance is an “unstable” way of measuring inaccuracy. Given one assignment of probabilities, you’ll expect some other assignment to be more accurate (unless the first assignment is either perfectly certain or perfectly uncertain).</p>
<p>That’s why accuraticians don’t use good ol’ Euclidean distance.</p>
<p><img src="https://crookedrunbrewing.files.wordpress.com/2014/05/scientician.png?w=240" alt="Just ask this accuratician" /></p>
<p>Instead they use… well, there are lots of alternatives. But the closest thing to a standard one is <em>Brier distance</em>: the square of Euclidean distance.</p>
<p>Here’s Euclid’s formula for the distance between two points $(a, b)$ and $(c, d)$ in the plane:
$$ \sqrt{ (a - c)^2 + (b - d)^2 }. $$
And here’s Brier’s:
$$ (a - c)^2 + (b - d)^2. $$
So, to get from Euclid to Brier, you just take away the square root.</p>
<p>That makes a world of difference, it turns out. Brier distance isn’t unstable the way Euclidean distance is. But we’ll see that it’s enough like Euclidean distance to vindicate the argument for the laws of probability we began with last time.</p>
<p>But first, a fun fact.</p>
<h1 id="fun-fact">Fun Fact</h1>
<p>Brier distance comes from the world of weather forecasting. Glenn W. Brier worked for the U. S. Weather Bureau, and in <a href="http://docs.lib.noaa.gov/rescue/mwr/078/mwr-078-01-0001.pdf" target="_blank">a 1950 paper</a> he proposed his formula as a way of measuring how well a weather forecaster is doing at predicting the weather.</p>
<p>Suppose you say there’s a 70% chance of rain. If it does rain, you’re hardly wrong, but you’re not exactly right either. Brier suggested assessing a forecaster’s probabilities by taking the square of the difference from $1$ when it rains, and from $0$ when it doesn’t.</p>
<p>Well, actually, he proposed taking the <em>average</em> of those squares. But we’ll follow the recent philosophical literature and keep it simple: we’ll just use the sum of squares rather than its average.</p>
<p>Now on to the substance. Two facts about Brier distance make it useful as a replacement for Euclidean distance.</p>
<h1 id="euclid-and-brier-are-ordinally-equivalent">Euclid and Brier are Ordinally Equivalent</h1>
<p>First, Brier distance is <em>ordinally equivalent</em> to Euclidean distance. Meaning: whenever a distance is larger according to Euclid, it’s larger according to Brier too. And vice versa.</p>
<p>How do we know that? Because Brier is just Euclid squared, and squaring a larger number always results in a larger number (for positive numbers like distances, anyway). If $D$ is the distance from Toronto to the sun, and $d$ is the distance from Toronto to the moon, then $D^2 > d^2$. It’s further to the sun than to the moon, both in terms of Brier distance and Euclidean distance.</p>
<p>So, when we’re comparing distances from the truth, Brier distance behaves a lot like Euclidean distance. In particular, what we learned from our opening diagram about Euclidean distance holds for Brier distance, too.</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/2D%20Dominance%20Diagram%20-%20400px.png" alt="Opening diagram" /></p>
<p>Not only is $c’$ closer to both vertices than $c^*$ in Euclidean terms, it’s also closer in terms of Brier distance.</p>
<h1 id="brier-is-stable">Brier is Stable</h1>
<p>Second, Brier distance doesn’t lead to the kind of instability that made Euclidean distance problematic. To see why, let’s rerun our expected inaccuracy calculations from last time, but using Brier distance instead of Euclid.</p>
<p>Suppose your credences in Heads and Tails are $p$ and $1-p$. What’s the expected inaccuracy of having some credence $q$ in Heads, and $1-q$ in Tails?</p>
<p>Well, the Brier distance between $(q, 1-q)$ and $(1,0)$ is:
$$(q - 1)^2 + ((1-q) - 0)^2.$$
And the Brier distance between $(q, 1-q)$ and $(0,1)$ is:
$$(q - 0)^2 + ((1-q) - 1)^2.$$
We don’t know which of $(1,0)$ or $(0,1)$ is the “true” one. But we have assigned them the probabilities $p$ and $1-p$, respectively. So we can calculate the expected inaccuracy of $(q, 1-q)$, written $EI(q, 1-q)$:
$$
\begin{align}
EI(q, 1-q) &= p \left( (q - 1)^2 + ((1-q) - 0)^2 \right)\\<br />
&\quad + (1-p) \left( (q - 0)^2 + ((1-q) - 1)^2 \right)\\<br />
&= 2 p (1 - q)^2 + 2(1-p) q^2\\<br />
&= 2 p q^2 - 4pq + 2p + 2q^2 - 2pq^2\\<br />
&= 2q^2 - 4pq + 2p
\end{align}
$$
Now that last line might look like a mess. But it’s really just a quadratic equation, where the variable is $q$. Remember: we’re treating $p$ as a constant since that’s the credence you hold. And we’re looking at potential values of $q$ to see which ones minimize the quantity $EI(q, 1-q)$, given a fixed credence of $p$ in heads.</p>
<p>So which value of $q$ minimizes this quadratic formula? You might remember from algebra class that a quadratic equation of the form:
$$
ax^2 + bx + c
$$
is a parabola, with the bottom of the bowl located at $x = -b/2a$. (Or, if you know some calculus, you can take the derivative and set it equal to $0$. Since the derivative here is $2ax + b$, setting it equal to $0$ yields, again, $x = -b/2a$.)</p>
<p>In the case of our formula, we have $a = 2$ and $b = -4p$. So the minimum happens when $q = 4p/4 = p$. In other words, given credence $p$ in heads, expected inaccuracy is minimized by sticking with that same credence, i.e. assigning $q = p$.</p>
<p>So, to complement our result about Euclidean distance from last time, we have a</p>
<p><strong>Theorem.</strong> Suppose $p \in [0,1]$. Then, according to the probability assignment $(p, 1-p)$, the expected Brier distance of any alternative assignment $(q, 1-q)$ from the points $(1,0)$ and $(0,1)$ is uniquely minimized when $p = q$.</p>
<p><em>Proof.</em> Scroll up! <span style="float: right;">$\Box$</span></p>
<h1 id="proper-scoring-rules">Proper Scoring Rules</h1>
<p>When a measure of inaccuracy is stable like this, it’s called <em>proper</em> (or sometimes: <em>immodest</em>).</p>
<p>There are lots of other proper ways of measuring inaccuracy besides Brier. But Brier tends to be the default among philosophers writing in the accuracy framework, at least as a working example. Why?</p>
<p>My impression (though I’m no guru) is that it’s the default because:</p>
<ol>
<li>Brier is a lot like Euclidean distance, as we saw. So it’s easier and more intuitive to work with than some of the alternatives.</li>
<li>Brier tends to be representative of other proper/immodest rules. If you discover something philosophically interesting using Brier, there’s a good chance it holds for many other proper scoring rules.</li>
<li>Brier has other nice mathematical properties which, according to authors like Richard Pettigrew, make it The One True Measure of Inaccuracy. (It may have some odd features too, though: see <a href="http://m-phi.blogspot.ca/2015/03/a-strange-thing-about-brier-score.html" target="_blank">this post</a> by Brian Knab and Miriam Schoenfield, for example.)</li>
</ol>
<p>How does our starting argument for the laws of total probability fare if we use other proper scoring rules, besides Brier? Really well, it turns out!</p>
<p>The key fact our diagram illustrates doesn’t just hold for Euclidean distance and Brier distance. Speaking <em>very</em> loosely: it holds on any proper way of measuring distance (but do see sections 8 and 9 of <a href="https://philpapers.org/rec/JOYAAC" target="_blank">Joyce’s 2009</a> for the details before getting carried away with this generalization; or see Theorem 4.3.5 of <a href="https://global.oup.com/academic/product/accuracy-and-the-laws-of-credence-9780198732716" target="_blank">Pettigrew 2016</a>).</p>
<p>Proving that requires grinding through a good deal of math, though. So in these posts we’re going to stick with Brier distance, at least for a while.</p>
<h1 id="begging-the-question">Begging the Question?</h1>
<p>We started these posts with an illustration of an influential argument for the laws of probability. But we quickly switched to <em>assuming</em> those very same laws in the arguments that followed.</p>
<p>For example, to illustrate the instability of Euclidean distance, I chose a point on the diagonal of our diagram, $(.6, .4)$. And in the theorem that generalized that example, I assumed probabilistic assignments like $(p, 1-p)$ and $(q, 1-q)$, which add up to $1$.</p>
<p>So didn’t we beg the question when we motivated switching from Euclid to Brier?</p>
<p>To some extent: yes. We are assuming that reasonable ways of measuring inaccuracy can’t be so hostile to the laws of probability that they make almost all probability assignments unstable.</p>
<p>But also: no. We aren’t assuming that the laws of probability are absolute and inviolable, just that they’re reasonable <em>sometimes</em>. Euclidean distance would rule out probabilistic credences on pretty much all occasions. So it conflicts with the very modest thought that following the laws of probability is <em>occasionally</em> reasonable. So, even if you’re just a little bit open to the idea of probability theory, Euclidean distance will seem pretty unfriendly.<sup class="footnote-ref" id="fnref:1"><a rel="footnote" href="#fn:1">1</a></sup></p>
<p>Perhaps most importantly, though: the motivation I’ve given you here for moving from Euclid to Brier isn’t the official one you’ll find in an actual, bottom-up argument for probability theory, like <a href="https://richardpettigrew.wordpress.com/accuracy-book/" target="_blank">Richard Pettigrew’s</a>. His argument starts from a much more abstract place. He starts with axioms that any measure of inaccuracy must obey, and then narrows things down to Brier.</p>
<p>So there’s the official story and the unofficial story. This post gives you the unofficial story, to help you get started. Because the official story is often really hard to understand. Not only is the math way more abstract, but the philosophical motivations are often hard to suss out. Because—and this is just between you and me now—the people telling the official story actually started out with the unofficial story, and then worked backwards until they came up with an officially respectable story that doesn’t beg the question quite so obviously.</p>
<p>Ok, that’s unfair. Here’s a more even-handed (and better-informed) way of putting it, from <a href="http://ndpr.nd.edu/news/70705-accuracy-and-the-laws-of-credence/" target="_blank">Kenny Easwaran’s review</a> of Pettigrew’s book:</p>
<blockquote>
<p>Some philosophers have a vision of what they do as starting from unassailable premises, and giving an ironclad argument for a conclusion. However, I think we’ve all often seen cases where these arguments are weaker than they seem to the author, and with the benefit of a bit of distance, one can often recognize how the premises were in fact motivated by an attempt to justify the conclusion, which was chosen in advance. Pettigrew avoids the charade of pretending to have come up with the premises independently of recognizing that they lead to the conclusions of his arguments. Instead, he is open about having chosen target conclusions in advance […] and investigated what collection of potentially plausible principles about accuracy and epistemic decision theory will lead to those conclusions.</p>
</blockquote>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">This argument is essentially drawn from <a href="https://philpapers.org/rec/JOYAAC" target="_blank">(Joyce 2009)</a>.
<a class="footnote-return" href="#fnref:1"><sup>[return]</sup></a></li>
</ol>
</div>
The Thursday Conundrum
http://jonathanweisberg.org/post/The%20Thursday%20Conundrum/
Mon, 16 Jan 2017 10:14:00 -0500http://jonathanweisberg.org/post/The%20Thursday%20Conundrum/
<p>In <a href="http://jonathanweisberg.org/post/An%20Editors%20Favourite%20Days/">an earlier post</a> we saw that Mondays and Thursdays are good for editors, at least at <a href="http://www.ergophiljournal.org/" target="_blank"><em>Ergo</em></a>. Potential referees say yes more often when invited on these days. But why?</p>
<p>Mondays aren’t too puzzling. It’s the start of a new week, so people are fresh, and maybe just a little deluded about how productive the coming week will prove to be.</p>
<p>But Thursdays? They don’t seem especially special. I tried <a href="http://jonathanweisberg.org/post/An%20Editors%20Favourite%20Days/#theory">speculating <em>a priori</em></a> about what might be going on there. But it’d be nice to have a hypothesis that’s grounded in some data.</p>
<h1 id="virtual-mondays">Virtual Mondays?</h1>
<p>At first I thought it might be something subtle. Maybe the day the invite is sent isn’t as important as when the referee <em>responds</em>. An invitation sent on Thursday might not be answered until the following Monday. Whereas invites sent on Monday might tend to be answered the same day. Then Thursday would end up being a kind of virtual Monday, as far as referees responding to invites goes.</p>
<p>That didn’t seem to fit the data, though. For one thing, if you look at which days referees are least likely to <em>respond</em> negatively, it’s Mondays and Thursdays again:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-2-1.png" alt="" /><!-- --></p>
<p>For another, if you look at when referees respond to requests sent on Monday, it’s the same pattern as for requests sent on Thursday. In either case, referees typically respond the same day, or in the next couple of days. Here’s the pattern for Monday-invites:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-3-1.png" alt="" /><!-- -->
And here are Thursday-invites:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-4-1.png" alt="" /><!-- -->
In case you’re curious, here are all the days of the week, tiled according to day-of-invite:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-5-1.png" alt="" /><!-- -->
The pattern is pretty similar regardless of the day the invite is sent (except for the predictable effect of weekends, which tend to dampen responses accross the board).</p>
<h1 id="the-beleaguered">The Beleaguered</h1>
<p>So my current hypothesis is much more flat-footed: it’s mainly a matter of when referees are busy. Monday they’re feeling fresh from the weekend, as I suggested. But why would Thursday be less overwhelming for referees? Maybe because they get fewer invitations then.</p>
<p>Let’s test that hypothesis. Here are the total numbers of invites sent out each day of the week, over the last two years at <em>Ergo</em>:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-6-1.png" alt="" /><!-- -->
The overall pattern is pretty much what you’d expect. Weekends are quiet (because even editors have lives). Then things pick up on Monday and Tuesday as the workweek begins, before declining again as the week wears on.</p>
<p>Note the uptick from Thursday to Friday, though: a difference of about 30 invitations. On a scale ranging from ~100 to ~250, that may be a non-trivial difference. And the same pattern shows up in both years we have data for:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-7-1.png" alt="" /><!-- -->
So maybe Thursdays are good because things are quieting down as the weekend approaches, and referees are receiving fewer requests. But by Friday it’s too late. The editors of the world are scrambling to lock down referees before the weekend. And, probably, referees aren’t keen to clutter up their desks just as they’re about to go into a weekend break.</p>
<h1 id="how-is-a-raven-like-a-writing-desk">How is a Raven Like a Writing Desk?</h1>
<p>But if Thursdays are good because fewer requests go out then, shouldn’t Mondays be terrible? We just saw that the start of the week is the busiest time as far as number of requests sent to referees.</p>
<p>My guess is that Monday and Thursday are to be explained somewhat differently. Thursdays are distinguished by their quietude, whereas Mondays are marked by vim and vigour. People are fresh, as I said. But also, the onslaught of the week’s workload hasn’t really hit yet.</p>
<p>In support of this last hypothesis, notice that the following pattern is quite robust: weekends are quiet, followed by a burst of activity early in the week, followed by decline towards the next weekend. We saw this pattern with editors sending requests to referees. But we see it other places too.</p>
<p>For example, here’s how the quantity of submissions the journal receives varies over the week:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-8-1.png" alt="" /><!-- -->
And here are the numbers of referee reports completed each day:
<img src="http://jonathanweisberg.org/img/the_thursday_conundrum_files/unnamed-chunk-9-1.png" alt="" /><!-- -->
Pretty clearly, authors, editors, and referees are all quietest on the weekends, and most active at the week’s start. (We also see in this last graph, as with editors contacting referees, that there’s a feeble resurgence toward week’s end—presumably in an attempt to clear the docket before the weekend.)</p>
<h1 id="conclusion">Conclusion</h1>
<p>So here’s my theory, at least for now.</p>
<p>Referees are game on Mondays for the obvious reasons: they’ve had the weekend to recharge and catch up, and the onslaught of Monday’s and Tuesday’s new submissions—and the corresponding wave of invitations to referees—hasn’t reverberated out into the referee-verse just yet. (Not to mention other demands, like teaching.)</p>
<p>Referees are game on Thursdays, too, but for somewhat different reasons. As the week wears on, authors and editors wind down, so referees find fewer invites in their inboxes. They’ve also completed their existing assignments earlier in the week, maybe even submitted their own papers. So they’re game, until the next day, Friday, when editors do their last-minute, pre-weekend scramble—which is especially ill-timed since referees are switching out of work-mode anyway.</p>
<p>It’s a bit unlovely and disunified, this explanation. But not entirely. Mondays and Thursdays do have something in common on this story. They’re both days when things are calmer for referees, albeit calm in different ways and for somewhat different reasons.</p>
<h1 id="technical-notes">Technical Notes</h1>
<p>This post was written in R Markdown and the source is <a href="https://github.com/jweisber/rgo/blob/master/thursday%20conundrum/the%20thursday%20conundrum.Rmd" target="_blank">available on GitHub</a>. I’m new to R and data science, and this post is a learning exercise for me. So I encourage you to check the code and contact me with corrections.</p>
Accuracy for Dummies, Part 1: Euclid Improper
http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%201/
Fri, 13 Jan 2017 09:53:00 -0500http://jonathanweisberg.org/post/Accuracy%20for%20Dummies%20-%20Part%201/
<p>If you’ve bumped into <a href="https://plato.stanford.edu/entries/epistemic-utility/#AccArg" target="_blank">the accuracy framework</a> before, you’ve probably seen a diagram like this one:</p>
<p><img src="http://jonathanweisberg.org/img/accuracy/2D Dominance Diagram - 400px.png" alt="" /></p>
<p>The vertices $(1,0)$ and $(0,1)$ represent two possibilities, whether a coin lands heads or tails in this example.</p>
<p>According to the laws of probability, the probability of heads and of tails must add up to $1$, like $.3 + .7$ or $.5 + .5$. So the diagonal line connecting the two vertices covers all the possible probability assignments… $(0,1)$, $(.3,.7)$, $(.5, .5)$, $(.9, .1)$, $(1,0)$, etc.$\newcommand{\vone}{(1,0)}$$\newcommand{\vtwo}{(0,1)}$</p>
<p>The diagram illustrates a key fact of the accuracy framework. Assignments that obey the laws of probability are always “closer to the truth” than assignments that violate those laws—<em>no matter what the truth turns out to be</em>. Given any point <em>not</em> on the line, there is a point <em>on</em> the line that is closer to <em>both</em> vertices $\vone$ and $\vtwo$. So, whether the coin lands heads or tails, you’ll be closer to the truth if your degrees of belief (a.k.a. “credences”) obey the laws of probability.</p>
<p>Take $c^*$, for example, which doesn’t lie on the diagonal line. Let’s assume $c^*$ is the point $(.7, .5)$, which violates the laws of probability: $.7 + .5 > 1$. Now compare that to $c’$, which does lie on the diagonal line. That’s the point $(.6, .4)$, which does obey the laws of probability: $.6 + .4 = 1$.</p>
<p>Well, $c’$ is closer to $\vone$ than $c^*$ is. Just look at the right-triangle connecting all three points: to get to $\vone$ from $c^*$ you have to travel along the hypotenuse. But you only have to travel the distance of one of the legs to get there from $c’$. And the same thinking applies to the other vertex, $\vtwo$. So $c’$ is closer to both vertices than $c^*$ is.</p>
<p>The same idea applies to <em>any</em> point in the unit square. If it’s not on the diagonal line, there’s a point on the line that will be closer to both vertices—because Pythagoras. Just go from whatever $c^*$ you start with to the closest point $c’$ on the diagonal. You’ll have two right-triangles, one for each vertex. So the $c’$ point will be closer to both vertices than the $c^*$ point you started with.</p>
<p>What’s more, no point on the line is closer to both vertices than any other. For example, if you move from $(.5, .5)$ to some other point on the line, you’ll move towards one vertex but away from the other. So if you’re off the line, you can always get closer to both vertices by moving onto the line. But once you’re on the line, there’s no way that guarantees you’ll be closer to the vertex representing the true outcome of the coin toss.</p>
<p>And that’s why you should obey the laws of probability, according to advocates of the accuracy framework. Violating the laws of probability takes you away from the truth, no matter what the truth turns out to be. Whereas if you obey the laws of probability, that doesn’t happen.</p>
<h1 id="hey-dummy">Hey, Dummy</h1>
<p>If you’re like me, the first time you see this argument you think to yourself: “Cool! The diagram gives me the key idea, I’ll worry about mathematical technicalities later (like, what if there are more than two possibilities?). For now let me just see where you’re going with this, epistemology-wise…”</p>
<p>But when I finally did sit down to work through the math, I found it much harder than I expected to answer some elementary questions. The answers to these questions were usually taken for granted in published work, or they went by so fast I wasn’t sure about the details.</p>
<p>I’m still working on filling in a lot of these gaps, as I work through <a href="https://global.oup.com/academic/product/accuracy-and-the-laws-of-credence-9780198732716?cc=ca&lang=en&" target="_blank">Richard Pettigrew’s excellent new book</a> and get up to speed (I hope!) with the latest research. I’m writing these posts to help me get clear on the basics, and hopefully help you do the same.</p>
<p><img src="https://s-media-cache-ak0.pinimg.com/736x/2e/46/00/2e4600f7eab945f936f00548b5498ba4.jpg" alt="Dennis Duffy: Hey Dummy" /></p>
<p>(Warning: since I’m learning this stuff as I go, my solutions and proofs won’t always be the best. In fact they’re bound to have errors. So I encourage you to contact me with corrections, and help improve these posts for others.)</p>
<p>Now on to today’s topic: Euclidean distance as a measure of accuracy.</p>
<h1 id="fear-of-a-euclidean-plane">Fear of a Euclidean Plane</h1>
<p>We just saw that the laws of probability keep you close to the truth in our coin-toss example, whatever the truth turns out to be. And by “close” we meant Euclidean distance, the kind of spatial distance familiar from grade-school geometry.</p>
<p>But people writing in the accuracy framework never use Euclidean distance. Why not? Because, it turns out, Euclidean distance is unstable!</p>
<p>“Unstable” how?</p>
<p>Well, if your aim is to be as close to the truth as possible in terms of Euclidean distance, then you will almost always be driven to change your opinion to something extreme: either $(1,0)$ or $(0,1)$. And not because you get some definite information about how the coin-flip turns out. But just because of the way Euclidean distance interacts with <a href="https://plato.stanford.edu/entries/rationality-normative-utility/#DefExpUti" target="_blank"><em>expected value</em></a>. (I’m going to assume you’re familiar with the notion of expected value. If not, you can read <a href="(https://plato.stanford.edu/entries/rationality-normative-utility/#DefExpUti)" target="_blank">the linked section</a> of the <em>SEP</em> article or do a bit of googling.)</p>
<p>Here’s how that happens. Suppose your credences in heads/tails are $(.6, .4)$: you’re $60\%$ confident the coin will land heads, and $40\%$ confident it’ll land tails. What’s your <em>expected inaccuracy</em>, then? If we think of accuracy as utility, and thus inaccuracy as disutility, how well can you expect to do by holding your current state of opinion?</p>
<p>Let’s run the calculation. We’ll write $EI(x, 1-x)$ for the expected inaccuracy of having credence $x$ in heads and $1-x$ in tails.
$$
\begin{align}
EI(.6, .4) &= .6 \sqrt{(.6 - 1)^2 + (.4 - 0)^2} + .4 \sqrt{(.6 - 0)^2 + (.4 - 1)^2}\\<br />
&= .6 \sqrt{(-.4)^2 + .4^2} + .4 \sqrt{.6^2 + (-.6)^2}\\<br />
&= .678823
\end{align}
$$
Ok, not bad. But now let’s compare that to how you can expect to do if you change your opinion to the extreme state $(1,0)$:
$$
\begin{align}
EI(1, 0) &= .6 \sqrt{(1 - 1)^2 + (0 - 0)^2} + .4 \sqrt{(1 - 0)^2 + (0 - 1)^2}\\<br />
&= .565685
\end{align}
$$
Some things to keep in mind here:</p>
<ol>
<li>The numbers outside the square root symbols are $.6$ and $.4$ because those are your current beliefs, and we’re asking how well you expect to do <em>according to your current beliefs</em>.
<ul>
<li>The numbers inside the square roots are $1$ and $0$ because we’re asking how well you expect to do by adopting those extreme opinions. So those numbers describe the outcomes whose inaccuracy we want to evaluate and weigh.</li>
</ul></li>
<li>Remember, <strong>smaller</strong> numbers are <strong>better</strong> because we’re talking about <strong>in</strong>accuracy.</li>
</ol>
<p>And look: the extreme opinion $(1, 0)$ does <em>better</em> than the more moderate opinion you actually hold, $(.6, .4)$. The extreme opinion has lower expected inaccuracy (think: higher expected accuracy).</p>
<p>In fact, the extreme assignment does better than the moderate one <em>according to the moderate assignment itself</em>. So your moderate opinions end up undermining themselves. They drive you to hold more extreme opinions than you initially do, in the name of accuracy.</p>
<p>This isn’t an artifact of the particular example $(.6, .4)$. We can prove that an extreme state of opinion always does best in terms of expected inaccuracy—unless you are completely uncertain about the outcome, i.e. $(.5,.5)$.</p>
<p><strong>Theorem.</strong>
Suppose $p \in [0, 1]$ and $p \neq .5$. Then, according to the probability assignment $(p, 1-p)$, the expected Euclidean distance of any alternative assignment $(q, 1-q)$ from the points $(1,0)$ and $(0,1)$ is uniquely minimized by:
$$
q = \begin{cases}
0 & \mbox{ if } p < .5,\\<br />
1 & \mbox{ if } p > .5.
\end{cases}
$$</p>
<p><em>Proof.</em>
Suppose $0 \leq p \leq 1$ and $p \neq .5$. According to the probability assignment $(p, 1-p)$, the expected Euclidean distance from $(1,0)$ and $(0,1)$ of any alternative assignment $(q, 1-q)$ is:
$$
\begin{align}
EI(q, 1-q) &= p \sqrt{(q - 1)^2 + ((1-q) - 0)^2}\\<br />
&\quad + (1-p) \sqrt{(q - 0)^2 + ((1-q) - 1)^2}\\<br />
&= p \sqrt{(q - 1)^2 + (1 - q)^2} + (1-p) \sqrt{q^2 + q^2}\\<br />
&= p \sqrt{2} (1 - q) + (1-p) \sqrt{2} q\\<br />
&= \sqrt{2} \left( p (1 - q) + (1-p) q \right).
\end{align}
$$
We are looking for the value of $q$ that minimizes the quantity on the last line, which is the same if we drop the $\sqrt{2}$ and just seek to minimize:
$$ p (1 - q) + (1-p) q. $$
This quantity is a <a href="https://en.wikipedia.org/wiki/Weighted_arithmetic_mean" target="_blank">weighted average</a> of the two values $p$ and $(1-p)$, with the weights being $1-q$ and $q$, respectively. So the minimum possible value is just whichever of $p$ or $1-p$ is smaller. And this minimum is achieved when all the weight is given to the smaller value.</p>
<p>So, if $p < .5$, then the minimum possible value is $p$, and it is achieved when $1 - q = 1$, and thus $q = 0$. If instead $p > .5$, the minimum possible value is $1 - p$ and is achieved when $q = 1$.
<span style="float: right;">$\Box$</span></p>
<p>Based on this proof you can also see what happens when $p=.5$. It doesn’t matter what value $q$ takes: any value $0 \leq q \leq 1$ will result in the same expected inaccuracy, namely $.5\sqrt{2}$.</p>
<p>So here’s the problem with Euclidean distance as a way of measuring inaccuracy. As soon as you find yourself leaning one way or another on heads-vs.-tails, you’re driven to extremes. If you get information that makes heads slightly more likely, say $.51$ for example, your expected inaccuracy is minimized by leaping to the conclusion that the coin will certainly come up heads.</p>
<p>So any probability assignment to heads/tails besides $(.5, .5)$ is self-undermining. It gives you cause to adopt some other assignment—an extreme one, at that.</p>
<p>Even at $(.5, .5)$ things aren’t so happy, btw. Any other assignment of probabilities is just as good as far as minimizing inaccuracy goes. So even if the pursuit of accuracy doesn’t <em>require</em> you to change your opinion, it still <em>permits</em> you to do so. As far as accuracy goes, being indifferent about the coin toss also makes you indifferent about what opinion to hold. Which is pretty strange in itself.</p>
<h1 id="where-this-leaves-us">Where This Leaves Us</h1>
<p>Wait a minute: if Euclidean distance is a bad way to measure inaccuracy, then what’s the use of the diagram we started with?? And what’s the right way to measure inaccuracy?</p>
<p>We’ll tackle these questions in the next post. But here’s the short answer.</p>
<p>One common way of measuring inaccuracy is a variation on Euclidean distance called <em>Brier</em> distance. Brier distance is just enough like Euclidean distance to vindicate the reasoning we did with our opening diagram. But it’s different enough from Euclidean distance to avoid the instability problem we ended up with.</p>
<p>So what is Brier distance? It’s just the square of Euclidean distance. Just take the square root symbol off Euclid’s formula and you’ve got the formula for Brier distance. Next time we’ll see how that one change makes all the right differences.</p>
An Editor's Favourite Days of the Week
http://jonathanweisberg.org/post/An%20Editors%20Favourite%20Days/
Mon, 09 Jan 2017 10:13:49 -0500http://jonathanweisberg.org/post/An%20Editors%20Favourite%20Days/
<p>Finding willing referees is one of the biggest challenges for a journal editor. Are referees more willing some days of the week than others? Apparently they are, on Mondays… and Thursdays, for some reason. At least, that’s how things have gone at <a href="http://ergophiljournal.org/" target="_blank"><em>Ergo</em></a> the last couple years (2015 and 2016).</p>
<h1 id="data">Data</h1>
<p>Consider the “bounce rate” for a given day of the week: the portion of invites sent on that day that end up being declined (<em>bounce rate = #declined / #invited</em>).</p>
<p>Editors prefer a lower bounce rate. And, on average over the last two years, the lowest bounce rates at <em>Ergo</em> were on Monday and Thursday:</p>
<p><img src="http://jonathanweisberg.org/img/editor_favourite_days_files/unnamed-chunk-2-1.png" alt="" /></p>
<p>It’s an odd pattern… maybe it’s not a pattern at all? To check, let’s look at 2015 and 2016 separately:</p>
<p><img src="http://jonathanweisberg.org/img/editor_favourite_days_files/unnamed-chunk-3-1.png" alt="" /><!-- --></p>
<p>It sure looks like the same pattern each year.<sup class="footnote-ref" id="fnref:0"><a rel="footnote" href="#fn:0">1</a></sup></p>
<p>Moreover, going back to the overall data from the first graph, there’s pretty significant fluctuation: from a minimum of 0.44 on Thursdays to a maximum of 0.6 on Tuesdays/Fridays/Saturdays, a difference of 0.16. That would be a lot of fluctuation if it were just random noise. Given how large the sample is (1280 invitations in all), it seems pretty safe to say this is a real thing.<sup class="footnote-ref" id="fnref:1"><a rel="footnote" href="#fn:1">2</a></sup></p>
<p>What about the difference between Mondays and Thursdays—is that significant or just noise? Well, it may not look trivial, but it’s not statistically significant according to the standard test for such things.<sup class="footnote-ref" id="fnref:2"><a rel="footnote" href="#fn:2">3</a></sup> As for the other days of the week, they’re generally even closer together, and the same test suggests there’s nothing significant in the variation there, either.<sup class="footnote-ref" id="fnref:3"><a rel="footnote" href="#fn:3">4</a></sup></p>
<p>Apparently, the most we can say is that Mondays and Thursdays stand out from the rest of the week.</p>
<h1 id="theory">Theory</h1>
<p>So what’s the explanation?</p>
<p>Monday I get. It’s the only day of the week that immediately follows the weekend (except on long weekends, obviously). So Mondays are when we’re most recovered from the burnout of the previous week. They’re when we’re full of expectations and plans for the coming week, and most deceived about how productive and unbusy Thursday and Friday will be.</p>
<p>But Thursdays are more puzzling. Why would they be special? A colleague suggested false optimism about free time as an explanation. “That makes sense for Mondays,” I thought, “but not Thursdays”.</p>
<p>Maybe it does make sense after all, though. Thursday is the closest you can get to the end of the week without being a Friday. And Fridays are basically the weekend, right? So maybe Thursday is late enough in the week that it seems there’s a whole work-week ahead to fill (like Monday but with even more <a href="https://en.wikipedia.org/wiki/Temporal_discounting" target="_blank">temporal discounting</a>). Whereas Fridays, well, dammit! That’s basically the weekend already. And you just agreed to a bunch of referee work yesterday!</p>
<h1 id="future-research">Future Research</h1>
<p>Well, it’s a theory. Clearly, more research is needed. In a future post I’ll dig into the data some more to look for possible explanations.</p>
<h1 id="technical-notes">Technical Notes</h1>
<p>This post was written in R Markdown and the source is <a href="https://github.com/jweisber/rgo/blob/master/editor%20favourite%20days/editor%20favourite%20days.Rmd" target="_blank">available on GitHub</a>. I’m new to both R and classical statistics, and this post is a learning exercise for me. So I encourage you to check the code and contact me with corrections.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:0">Except for Saturdays where the 2015 data is sparse, at only 32 invites. Just 6 referees would have had to respond differently to Saturday-invites in 2015 to eliminate the difference from 2016.
<a class="footnote-return" href="#fnref:0"><sup>[return]</sup></a></li>
<li id="fn:1">For the statistically inclined, if the null hypothesis is that a referee’s response is independent of the day of the week of the invite, then: $\chi^2(6, N = 1280) = 18.93$, $p = 0.004$. If instead the null hypothesis is that a referee’s response is independent of whether or not the day of the invite is a Monday or Thursday, then: $\chi^2(1, N = 1280) = 16.35$, $p = 10^{-4}$.
<a class="footnote-return" href="#fnref:1"><sup>[return]</sup></a></li>
<li id="fn:2">$\chi^2(1, N = 403) = 0.98$, $p = 0.322$.
<a class="footnote-return" href="#fnref:2"><sup>[return]</sup></a></li>
<li id="fn:3">$\chi^2(4, N = 877) = 0.9$, $p = 0.924$.
<a class="footnote-return" href="#fnref:3"><sup>[return]</sup></a></li>
</ol>
</div>
Victor: Hugo
http://jonathanweisberg.org/post/Hugo/
Sun, 08 Jan 2017 00:00:01 -0500http://jonathanweisberg.org/post/Hugo/
<p>I chose Hugo to manage this site mainly because it’s shiny and new. But there were some substantive reasons, too.</p>
<p>Hugo is famously fast—<em>really</em> fast—but that wasn’t the big draw for me. For a personal website, even one with many years’ worth of blogging accrued, Hugo’s speed just isn’t that important. Quoth <a href="http://thecodebarbarian.com/2015/02/06/static_site_generators.html" target="_blank">the Code Barbarian</a>:</p>
<blockquote>
<p>being the fastest static site generator is like being the resturaunt with the cleanest bathroom: sure, I prefer a faster generator, but I wouldn’t pick one generator over another because one shaves off 100ms.</p>
</blockquote>
<p>What attracted me more was the lack of dependencies. Hugo is just a single, pre-compiled binary. That’s one reason I didn’t go with the Ruby-based Jekyll. Despite being a Rails regular, I always get a twinge of vertigo when I have to wrestle with gems, bundler, rvm, and all that. It seems there’s always just enough time between bundler/rvm meltdowns for me to forget how it’s all supposed to work. So I never really get fluent, and never really feel confident working with those tools.</p>
<p>So how did getting started with Hugo go? I’m pretty happy now that I’ve got things set up as I need them. But there were some unexpected bumps in the road.</p>
<h1 id="templates">Templates</h1>
<p>I ended up paying for snubbing my beloved Ruby. Hugo is written in the Go language, and its template engine isn’t exactly user-friendly. It’s especially unpleasant if you’re coming from a mature and sugary place like Rails. Quoth the Code Barbarian again:</p>
<blockquote>
<p>I’m sure somebody out there likes Go’s HTML templating, but, as somebody who’s used to more sophisticated tools like Jade, it makes gouging my eyes out with a rusty fork seem like an appealing alternative.</p>
</blockquote>
<p>I dunno about <em>rusty</em>, but… well, here’s an example. The following snippet is from the template for this site’s header. It determines whether to include MathJax, which can be done on either a site-wide or a by-page basis:</p>
<pre><code class="language-HTML">{{- if (or (.Site.Params.math) (.Params.math)) }}
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
{{- end }}
</code></pre>
<p>A few things stand out here. First is the <a href="https://en.wikipedia.org/wiki/Polish_notation" target="_blank">Polish notation</a>-esque way the logic has to be written, with the <code>or</code> to the left of its two arguments. What you really want, of course, is the more familiar <a href="https://en.wikipedia.org/wiki/Infix_notation" target="_blank">infix</a> construction:</p>
<pre><code class="language-HTML">{{- if (.Site.Params.math or .Params.math) -}}
...
{{- end -}}
</code></pre>
<p>But that doesn’t appear to be valid Go template-speak.</p>
<p>Then there’s the way the parameters are referenced: beginning with the anonymous and mysterious <code>.</code>, which refers to the current “context”. Concision is nice and all, but just a <code>.</code>?? A <code>.</code> with nothing to its left just seems unnecessarily obscure, almost Perl-like in its obfuscation. (And then there’s the fact that there are two sets of parameters named <code>Params</code>.)</p>
<p><img src="http://68.media.tumblr.com/edc428840aab503b2407a9629793058c/tumblr_mqm0qbH01O1r3vs52o1_500.gif" alt="Shawshank Redemption: How can you be so obtuse?" /></p>
<p>Another small thing: the <code>-</code> sign attached to each <code>{{</code> and <code>}}</code> prevents a blank line from appearing in the rendered HTML where the template code used to be. Rails used to need this kind of thing, but not anymore. And given how cluttered templates can get, every little bit of tidying helps. It’d be nice to have a configuration option that just makes the <code>-</code> implicit by default.</p>
<p>Getting even more nit-picky: I personally find the “mustache” notation <code>{{</code> and <code>}}</code> unintuitive, even kind of distracting. It doesn’t sit well with the surrounding HTML like the corresponding <code><%</code> and <code>%></code> of embedded Ruby does. You might think it helps make the Go code stand out from the HTML, but that’s really a job for your syntax-highlighter. And something about those double curly braces just doesn’t say to me: <em>your code here!</em> I just see redundant C syntax instead.</p>
<p>These are small things, and probably idiosyncratic to me, in large part. But they did accumulate to noticeably slow down the whole getting-started process.</p>
<h1 id="themes">Themes</h1>
<p>Ironically, none of that would have been an issue if I’d just gone with one of the existing, out-of-the-box themes, as I originally intended. Then I could’ve just poured my content into an existing template, mixed, and served. But I couldn’t find a theme that suited me! (Shocking, I know.) There are <a href="http://themes.gohugo.io/" target="_blank">loads</a>, including some great-looking ones. But they’re mostly designed for developers (<em>gasp</em>) and the people they develop for (<em>escandalo</em>!), not academics.</p>
<p>The <a href="https://github.com/gcushen/hugo-academic" target="_blank">Academic</a> theme seemed the obvious choice. And it’s probably great if you’re a big-time engineering prof running a ginormous lab teeming with underlings who churn out ten publications a year under your name. But for a humanist shitmuncher such as myself, all the industrial-strength plumbing just got in the way.</p>
<p>The <a href="http://themes.gohugo.io/hugo-finite/" target="_blank">Finite</a> theme was much closer to what I needed, but still overkill. And using it effectively would’ve required learning a lot about Hugo’s inner workings anyway.</p>
<p>So I sucked it up and created my own theme, <a href="http://www.github.com/jweisber/scarab" target="_blank">Scarab</a>. It’s basically a dumbed down version of Finite (but using Bootstrap instead of Foundation). It should probably be named Infinitesimal, not Scarab. But like I said: <a href="https://en.wikipedia.org/wiki/Dung_beetle" target="_blank">shitmuncher</a>.</p>
<h1 id="docs">Docs</h1>
<p>Luckily, Hugo has pretty thorough and clear documentation, including a solid tutorial on creating a new theme. But to really get where I wanted to go, I needed to read the full docs pretty much all the way through. And even though they’re thorough and clear (for the most part), they are occasionally terse. The organization also leaves something to be desired. So there was a lot of trial and error and googling, and there still is.</p>
<h1 id="git-glitch">Git Glitch</h1>
<p>Once I had the whole site built and themed, it looked like the sailing would be smooth. The workflow for creating new content is straightforward and fluid. But, of course, there was still one crag to navigate: version control.</p>
<p>I basically have three separate projects in one place. First there’s the top-level directory, <code>hugo/</code>, with subdirectories for my Markdown content, images, etc. Second is the <code>hugo/public/</code> subdirectory, where Hugo assembles all that stuff into the actual HTML and related site materials that get uploaded to a public server. Third and finally, there’s the <code>hugo/themes/scarab/</code> subdirectory for my custom theme, which I’m constantly tweaking and updating.</p>
<p>Each of these needs its own git repo. Why?</p>
<p>Well, this site is hosted on GitHub pages as a “User” page. So <code>hugo/public/</code> needs to be on the <code>master</code> branch. And since I don’t want to make all my drafts and raw materials public on GitHub, <code>hugo</code> and <code>hugo/public/</code> needs separate repos. As for <code>hugo/themes/scarab</code>, that’s its own project. People should be able to download or fork it on its own, without all my stupid content bundled in. So that’s another repo.</p>
<p>So I have three projects, two of which are in (sub)subdirectories of a third, with <code>.gitignore</code> the only thing making sure they don’t get all scrambled up. And every time I want to update my site, I have to do two commits: one for <code>hugo/</code> and one for <code>hugo/public/</code>. I only have to do one push, since only the second repo is up on GitHub (I don’t have a paid/private account). Unless I’m updating the theme, in which case I have to do three commits and two pushes. It’s not ideal.</p>
<h1 id="conclusion">Conclusion</h1>
<p>If I sound like I’m complaining, that’s because I love unnecessarily complex technical projects that give me something to complain about. Especially if I learn something in the process. And I did learn a thing or two in this process. I’m even a little sad now that it’s over and Hugo is humming along, staying out of my way so that I have to actually work on creating content. Maybe I’ll just try one more tweak to the theme first; I hear there’s a new free serif font that’s supposed to be amaze…</p>
iTerm from RStudio
http://jonathanweisberg.org/post/iTerm%20with%20RStudio/
Fri, 06 Jan 2017 00:00:00 -0500http://jonathanweisberg.org/post/iTerm%20with%20RStudio/<p>I can’t abide OS X’s built-in Terminal.app—too janky by half. So like any reasonable obssessive, I use iTerm2.</p>
<p>Now, RStudio helpfully provides a menu item for opening a shell in the current project’s directory. But it calls Terminal.app, with no option to substitue an alternative.</p>
<p>R-bloggers has <a href="https://www.r-bloggers.com/change-the-default-shell-action-in-rstudio-for-os-x/" target="_blank">a helpful post</a> about this. But it’s from way back in the dark ages (January 2014), and the solution there doesn’t seem to work any more thanks to <a href="https://www.iterm2.com/documentation-scripting.html" target="_blank">changes in iTerm</a>.</p>
<p>A little tweaking yields a working variant, though—improved, even. Just back up the script at <code>/Applications/RStudio.app/Contents/MacOS/mac-terminal</code> and edit the original to read:</p>
<pre><code class="language-applescript">#!/usr/bin/osascript
on run argv
set dir to quoted form of (first item of argv)
if application "iTerm" is running then
set wasRunning to true
else
set wasRunning to false
end if
tell application "iTerm"
activate
if wasRunning then
tell current window
create tab with default profile
end tell
end if
tell last session of current tab of current window
set name to "RStudio Session"
write text "cd " & dir & "; clear"
end tell
end tell
end run
</code></pre>
<p>I also added a keyboard shortcut, ⌘+T, in RStudio under Tools > Modify Keyboard Shortcuts….</p>
<p>One nagging imperfection remains, though. Is the best we can do to drop ourselves into the project directory really <code>write text "cd" & dir</code>? Watching AppleScript scramble to <code>cd</code> and <code>clear</code> pains me each time—so inelegant.</p>
<p>Ah, well. It’s not as though elegance was the whole point of switching to iTerm in the first place…</p>
This Must Be the Place
http://jonathanweisberg.org/post/This%20Must%20Be%20the%20Place/
Tue, 03 Jan 2017 00:00:00 -0500http://jonathanweisberg.org/post/This%20Must%20Be%20the%20Place/<p>For years I lovingly hand-coded my personal website. The ethos was pure, the product normcore: a single page, packed with links to my academic papers and other projects, all lightly seasoned with a sprinkle of explanatory text.</p>
<p>I scoffed at the polished and capacious WordPress sites of my colleagues. “A navbar? With <em>four</em> sections?? Does that link to a PDF of your CV really need a whole page to itself?”</p>
<p>I also scoffed at colleagues with personal blogs featuring mostly tumbleweeds, and the occasional, stray comment from some ill-informed passerby.</p>
<p>But then I discovered static site generators like <a href="https://gohugo.io/" target="_blank">Hugo</a> and <a href="https://jekyllrb.com" target="_blank">Jekyll</a>. I can’t resist a shiny new tech-toy, especially when I have no real use for it. I’d always kind of envied my tumbleweed blogging colleagues anyway—they had a place to think out loud, to jot down ideas, and to escape the confines of formal academic writing. And even if Prof. WordPress’s site had all the heart and soul of an Ikea display, it still <em>looked</em> better than my homespun design.</p>
<p><img src="https://amandaelsewhere.files.wordpress.com/2011/10/pretty-in-pink.jpg" alt="Pretty in Pink" /></p>
<p>So it was only a matter of time until I caved. I installed Hugo and created an unnecessarily complex new site, complete with blog and tumbleweeds. (No comments section, though.) I’m hoping it’ll motivate me to create worthy content to furnish it <em>post hoc</em>.</p>
<p>But for today self-indulgence is the watchword. So this inaugural post just issues some promissory notes: here are some topics and posts in the pipeline. (Because if I promise it in public then I actually have to deliver, right?)</p>
<ul>
<li>Ideas about the future of academic journals, especially in my field, philosophy.</li>
<li>Adventures in amateur data science. My new favourite toy is <a href="https://www.r-project.org/" target="_blank">R</a>, and I’ve been using it to explore the data collected by <a href="http://www.ergophiljournal.org" target="_blank">Ergo, an Open Access Journal of Philosophy</a>. I’ll be posting fun facts and findings from that project.</li>
<li>Accuracy for Dummies: tutorials and ideas about the accuracy framework in formal epistemology, as I work through <a href="https://richardpettigrew.wordpress.com/" target="_blank">Richard Pettigrew’s</a> excellent <a href="https://global.oup.com/academic/product/accuracy-and-the-laws-of-credence-9780198732716?cc=ca&lang=en&" target="_blank">new book</a>.</li>
<li>Assorted nerdery: hacks, tips, and hot-takes from my inept fumblings with my technological toys of choice.</li>
<li>Pop philosophy on sundry topics: objectivity, politics, probability and statistics, food… whatever I can plausibly fake some expertise on with enough caffeine.</li>
</ul>
<p>If you’re the sort of bizarre creature who might be interested in this particular slurry, you can subscribe to <a href="http://jonathanweisberg.org/index.xml" type="application/rss+xml" target="_blank">the RSS feed</a>. But I can’t recommend it.</p>
Risk Writ Large
http://jonathanweisberg.org/publication/Risk%20Writ%20Large/
Wed, 28 Dec 2016 11:20:45 -0500http://jonathanweisberg.org/publication/Risk%20Writ%20Large/<p>Risk-weighted expected utility (REU) theory is motivated by small-world problems like the Allais paradox, but it is a grand-world theory by nature. And, at the grand-world level, its ability to handle the Allais paradox is dubious. The REU model described in Risk and Rationality turns out to be risk-seeking rather than risk-averse on one natural way of formulating the Allais gambles in the grand-world context. This result illustrates a general problem with the case for REU theory, we argue. There is a tension between the small-world thinking marshaled against standard expected utility theory, and the grand-world thinking inherent to the risk-weighted alternative.</p>