“Novel Analytics from James Joyce to the Bestseller Code”

Professor Matthew L. Jockers, University of Nebraska

Thursday, March 2 4:30 PM

Russell House

Refreshments provided


In a New Yorker article in 2014, Joshua Rothman asks, provocatively and rhetorically: “If you’re an English professor, how should you spend your time: producing [close] ‘readings’ of the literary works that you care about (art), or looking for the [distant] patterns that shape whole literary forms or periods (science)?” Rothman’s parentheticals, “art” and “science,” make for a good editorial hook, but they frame a misleading and false dichotomy.  The emerging debate in literary studies pitting traditional scholarly practices of close reading against digitally oriented methods of “distant” reading is a nonstarter. What gets disguised as an argument over method (close vs. distant) and discipline (art vs. science) is, in fact, an argument about interpretation and the ways that literary scholars collect and prioritize evidence. In this talk Jockers proposes a methodological reconciliation that understands large scale computational approaches to literature as entirely consistent with traditional practices of close reading.

Data, Computing and Journalism

Andrew B. Tran

Monday 2/13, 4:30 PM, Allbritton  103
Snacks and refreshments will be available

Increasingly, journalists are turning to tools that were once solely the domain of data analysts and computer scientists to create compelling visualizations and enhance their storytelling. Newsrooms are using accessible technology to process big and open data to assist in investigations, keep citizens informed, and help make institutions accountable— and they’re often following the tenets of data science, like making their work transparent and reproducible. It’s important, now more than ever, that data not be hidden by government agencies from the public so that it instead might be used to illuminate the truth. Andrew B. Tran

Andrew,  currently a Koeppel Journalism Fellow at the Center for the Study of Public Life is the senior data editor of Trend ct (http://trendct.org/about/ a CT Mirror affiliate).  He was a founding producer of The Boston Globe’s Data Desk where he used a variety of methods to visualize or tell stories with data. He also was an online producer at The Virginian-Pilot and a staff writer at the South Florida Sun-Sentinel. He’s a Metpro Fellow, a Chips Quinn Scholar, and a graduate of the University of Texas.

Gender and Power Dynamics in Elite Discourse: Evidence from the U.S. Supreme Court



Assistant Professor of Political Science and Sociology
University of Iowa

Do men talk over women even in elite settings? Whether it is the corporate board room or the kitchen table, gender dynamics affect the way men and women talk to one another. One would hope women who hold public office do not face similar biases, but this study shows, even on the Supreme Court women are spoken too differently than men. Using R, Python and a high-performance computing cluster, we obtained the text and audio from over 500,000 utterances during oral arguments from 1982-2014.  Ultimately, we show male Justices and attorneys display verbal and non-verbal dominance towards female Justices.
This study is part of a larger research agenda that emphasizes the importance of elite non-verbal behavior, such as changes in vocal pitch. Unlike other time series, vocal pitch is incredibly stable at short intervals, meaning it can be extracted using a windowed autocorrelation function. While there are a number of other variables one can use, vocal pitch has been shown to be associated with dominance, even in institutional settings, making it particular useful for this application,  and one of the first to highlight “Panel effects” long noted by judicial scholars. We show vocal pitch is not only indicative of underlying gender dynamics, but it also influences voting behavior. With that said, there are a number of other ways audio can be used for research, ranging from speaker segmentation to supervised classification. Part of the presentation will lay an important foundation and provide some guidelines for those interested in using these techniques for future work.

“Fighting Poverty with Data: Research at the Intersection of Machine Learning and Development Economics” with Joshua Blumenstock ’03


Friday, April 15th at 12pm in Downey House 113.

In wealthy nations, novel sources of “big data” from the internet and social media are creating new opportunities for commercial profit, enabling new approaches to social science research, and inspiring new perspectives on public policy. In developing economies, however, fewer sources of robust data exist, and it remains unclear if and how the world’s poor will benefit from the ongoing “data revolution.” In this talk, I will discuss a series of studies that combine insights from machine learning with traditional methods in empirical economics to better understand economic development and vulnerability. The talk will focus on recent results from Afghanistan, Ghana, and Rwanda, which show how terabyte-scale data from mobile phone networks can be combined with field-based experiments and on-the-ground interviews to construct accurate estimates of the distribution of poverty and wealth. In resource-constrained environments where censuses and household surveys are rare, this creates options for gathering localized and timely information at a fraction of the cost of traditional methods.

Joshua Blumenstock graduated from Wesleyan University in 2003 with Degrees in Computer Science and Physics. After Wesleyan, he did a Watson Fellowship, spent a few years in internet startups, and then went back to grad school at U.C. Berkeley, where he received a Ph.D. in Information Science and a M.A. in Economics. Currently, Joshua is an Assistant Professor the University of Washington, with faculty appointments in the Information School and the Department of Computer Science and Engineering. He is also the founder and co-Director of the Data Science and Analytics Lab, where he develops new methods for the analysis of large-scale behavioral data, with a focus on how such data can be used to better understand poverty and economic development. Recent projects combine field experiments with terabyte-scale spatiotemporal network data to model decision-making in poor and conflict-affected regions of the world. He is a recipient of the Intel Faculty Early Career Honor, a Gates Millenium Grand Challenge award, a Google Faculty Research Award.

This event is sponsored by DaCKI

“Signaling not Persuasion: the Surprising Power of the Presidential Bully Pulpit” with Justin Grimmer, Stanford

Thursday, March 24th at 4:30pm in Russell House

Why do Presidents “go public”?  We use novel natural experiments, social media data, and extensive news analysis to show that Presidents have little direct effect on public opinion when they appeal to the public.  Rather, we argue, Presidents go public to signal to Congress that an issue is particularly important to them.

Justin Grimmer is Associate Professor of Political Science at Stanford University.

This event is sponsored by DaCKI.

DaCKI Manhattans and Martinis


Tuesday, December 15th at 4:15 in Allbritton 311

Classes will be but a memory, a bit of grit for your pearls of winter productivity. There will be nice snacks and cocktails of course, rye and gin, maraschino and lemon where they’re meant to be.

We’ll be there to hear about (and see) some of the diverse ways that we analyse, question, and shape images in our research. Come with your ideas about methods and experiences from your own work and we can share some of what we do across the campus. Teaching ourselves and our students how to analyze images and to analyze them differently is increasingly important.

To get our thoughts going, we’ll hear briefly from Nadja Aksamija (Art History), Christopher Chenier (Digital Design Studio), Steve Devoto (Biology), Justin Marks (Math), and Greg Voth (Physics).

Matthew Daniels: “Depiction, even over Description: How Data Journalism is Changing the Art of Storytelling”

At 4:30 on Wednesday November 4th, in 41 Wyllys, Room 112
In cooperation with the Allbritton Center, the Digital and Computational Knowledge Initiative (DACKI) is pleased to be bringing Matt Daniels to Wesleyan next week to give a talk Wednesday and then to meet with students in classes and small groups on Thursday.
He is a media artist and designer, fascinated with the possibilities of data-driven narrative. For Daniels, this has often meant analysing and illustrating the content and popularity of music, its lyrics, and its locations. He has produced infographics keyed to such things as the size of rappers’ vocabularies and the timelessness of some music based on Spotify data. He publishes an online magazine called Polygraph.

Textual and Content Analysis: Introduction to Problems and Projects (with Manhattans & Martinis)

To mark the end of the teaching year, DACKI will be hosting its third year-end, relaxing introduction to the digital, featuring Manhattans & Martinis. On May 7th at 4:15 in Allbritton 311–the room with the view–come for a cocktail, some snacks, and conversation. We’ll listen to several of our colleagues share their reflections on trying to engage with digital and computational textual and content analysis in the classroom and in research.

There are a wide variety of software and methods available that promise to let us see and use our texts and entire textual corpora more effectively. Some also allow for parsing interviews, sounds, and visual content. In some cases, the data might be big, vast streams of tweets or advertising, but in others they are conventional bodies of literature or even just a few texts seen in a new light. Possibilities seem endless, a bit intimidating, and we hope to help by clarifying directions, since digital content analysis promises to enrichen teaching as much as research.

This should be a good way to start thinking of summer research and reading, and I hope to see you there.

“Predicting Premier League Soccer Using a Sentiment Analysis of Twitter” with Professor Robert Schumaker

In this talk Professor Schumaker, who teaches at Central Connecticut State University, will try to answer the question: “Can the sentiment contained in tweets serve as a meaningful proxy to predict match outcomes and if so, can the magnitude of these outcomes be similarly predicted based on the degree of sentiment?”

This talk should be an excellent place to learn about some of the key areas the initiative is trying to explore–big data and text analysis. The predictive power of sentiment analysis has been a consistent element of Schumaker’s work which he has applied to the stock market as well as sports.

There will be some refreshments available and we’ve found that a Friday afternoon talk can be a nice way to end the week. What’s more, this talk give you just enough time to work things out before the Saturday morning kickoffs and certainly before Chelsea goes to the Arsenal that Sunday.


Can the sentiment contained in tweets serve as meaningful proxy to predict match outcomes and if so, can the magnitude of these outcomes be similarly predicted based on the degree of sentiment?  To answer these questions we constructed the CentralSport system to gather Tweets from the English Premier League and analyze their sentiment content for use in predicting match outcomes.  From our analysis, we found that the models incorporating positive tweets were easier to profit from (All Positive model netted a $3,375.18 excess return).  Looking deeper into the models we found point spread prediction was possible.  Clubs with 1,000 or more negative tweets than their rival would typically lose by 1 goal (observed 65.2% of the time).  Clubs with 2,000 to 10,000 more positive tweets would win by 1 goal (56.25%) and 10,000+ positive tweets would win by 2+ goals (100%).  These results demonstrate the power of hidden information contained within tweet sentiment and has implications on wagering systems.

Friday, April 24th at 4:15 in Usdan 108