How I got in Whalen’s Lab

As another internship season approaches, many friends are asking me how I got my summer 2017 biology research position without having taken any biology at Haverford. [1]

The process was not straightforward. I set out considering a major at Bryn Mawr, then halfway switched to Swarthmore, and ended up determined, “If biology, then Haverford.” Last summer was to get a taste of biology at Haverford.

Once the decision was made, I immediately reached out to Professor Whalen, whom I had chatted with at academic tea, a casual gathering of all departmental representatives at the beginning of every semester, to answer students’ questions. I had also browsed professors’ Haverford webpages, where their CVs and research directions are listed. Although my tentative major was biology, I could not understand the content of any project our biology professors listed. Still, marine science, drug resistance, and a photo of Professor Whalen smiling from a solid blue ocean-sky background attracted me above all.

I sent out my first email on Dec. 22, 2016, and went to play in New Haven. When I came back, a reply had lain in my mailbox since the day I left — what a quick response! To day, Professor Whalen’s efficiency is still surprising, motivating, and scaring me from time to time. Back then, I immediately arranged to meet with my first-ever mentor.

She showed me all her undertakings and some summer opportunities, and asked me to show up at lab meetings the coming semester. Since I could only work on-campus as an international student, we decided to begin with the “bacterial response to a chemical” project ongoing at her lab. That was it! I became part of Whalen’s lab. When the summer scholarship application season came, she instructed me to apply (Kovaric Fellowship [2]). When I failed, she applied funding from Provost for me, so I could get paid. Everything was settled as early as Mar.18, 2017, after which I just sat back and pictured the richness of the coming summer.

 

Takeaways for new applicants:

  1. Start collecting information early; browse professor/institution’s webpages, and from there find out more
  2. After narrowing down your choices, reach out (sometimes it takes longer to receive reply; don’t feel discouraged or overly anxious)
  3. Academic tea is a great space to ask any lay (or expert) questions about subject/ courses/ major/ internship/ …; professors are there for you
  4. Don’t be afraid if professor’s research seems hard to understand!

 

[1] only sophomores and above could take biology courses at Haverford

[2] funding opportunities please see: www.haverford.edu/integrated-natural-sciences-center/programs-funding/student-research-funding

Summer Research Report: Predicting March Madness

Last summer I received funding from a Velay Fellowship to do sports analytics research at Davidson College in North Carolina and my main project was developing an algorithm to predict outcomes in the NCAA Division I March Madness Men’s Basketball Tournament. I finished my project in August and was able to backtest five years and demonstrate that my algorithm would have outperformed other methods (including fivethirtyeight,  Power Rank, and numberFire), but I knew the first real test would come now, in March, as my algorithm first tackles a tournament field in real time. Below is the table of predicted outcomes produced and some further explanation and insight into what I’ve learned.

What am I looking at?

Each row of this table represents one of 64 teams in the 2018 March Madness Tournament. Choose a team in the first column and move to the second column to see the probability that that team will appear in the second round or, equivalently, the chance they will win their first matchup. The probabilities range from 0 to 1 with, for example, .54 indicating a 54% chance of the team successfully making it to the second round. The following columns will give the chance that the team appears in each of the subsequent rounds of the tournament, and the final column gives the probability that the team will be named champions. The chance any given team will appear in the second round (Round of 32) is greater than the chance they will appear in the third round (Sweet Sixteen) which is in turn greater than the chance they will appear in the fourth round (Elite Eight) and so on. According to this table, Virginia has the highest chance (99.2 percent) of winning their first game and making it to the second round and a 28.66 percent chance of winning the whole tournament.

What exactly determined these probabilities?

I pulled data on kenpom.com and masseyratings.com and used JMP statistical software to look for correlations between over 50 team and player stats and game outcomes. Sometimes there is a clear correlation that can easily be modeled by a linear regression: for example, points per possession is strongly correlated to winning games. Other statistics are more complicated: for example, player experience doesn’t always strongly correlate to success. The best teams are often either heavy with “one-and-done” freshmen or loaded with experienced upperclassmen. JMP tools showed that the relationship between player experience and success was best modeled by a quadratic, not linear, equation. It was also important to be careful of team statistics that were highly correlated with each other. For example, my team’s turnovers and my opponents’ steals essentially measure the same component of the game—including both statistics would run the risk of overweighting the importance that component.

I used a Python program to implement the logistic regression I designed to predict every possible tournament matchup. In later rounds of the tournament it is important to note that we are dealing with compounding probabilities. There are multiple possible opponents a team could face in a later round so the probability they will win is the summation of the probability they will beat each potential opponent multiplied by the chance they will make it to the game, multiplied by the chance that that opponent will appear in the game. The final results were formatted in Tableau to create the above table.

How is this model different than other prediction methods?

There have been many, many previous efforts to correlate team and player statistics with winning games and use to regressions to predict future games. For the most part the quality of the two teams (dictated by their season stats) does a good job of indicating who is likely to win, but sometimes teams with worse stats beat teams with better stats, and sometimes those results are predictable. Most sports analysts would call those games “bad matchups.” My go-to example is Villanova and Butler. Over the past four years, Villanova has maintained a stronger statistical profile and consistently placed well above Butler in rankings and polls, but dropped three games in a row to Butler in 2017. That type of result inspired me to look for correlations between two teams’ stat differentials and result. Instead of a regression that predicts ‘how likely is a team this good to beat a team that good?’ I wanted a regression that looked at ‘how likely is this team to win against a team that’s this much better than them at shooting free throws and this much worse at causing turnovers?’ If another team popped up in the tournament that was clearly statistically inferior to Villanova but was strong in the same categories as Butler, my algorithm would have a better chance of picking up that potential upset.

Why publish probabilities and not just predict winners?

College basketball is inherently unpredictable, but analysts have shown both success and improvement. There certainly are methods that provide vast advantages over a 50-50 coin toss and some prediction algorithms have demonstrated upwards of 70 percent accuracy over tens of thousands of games. With better data and methods, accuracy has and likely will continue to improve, but the consensus is that the cap is well below perfection. There will never be a way to fully account for the freak accidents, the emotions, the technical failures, and other uncountables that can affect the outcome of a game. I’m personally inclined to believe that college basketball is no more than 80 percent predictable. A list of only predicted winners undoubtedly contains incorrect results and there would be no way to help you identify which those might be. Publishing a list of probabilities gives you an idea of which games are more competitive and likely to go either way.

How do I turn this information into a bracket?

The simplest way to translate this information into picks for your bracket would be to advance all the teams on your bracket with probability greater than .50 (fifty percent) of appearing in the second round, then advance the all the teams with greater than .50 probability of making the third round and so on.

Will this deliver a perfect bracket? Almost certainly not. Even if every probability was spot on (i.e. every team the algorithm gives a .25 probability of advancing actually has an exactly 25 percent chance) the chance that the more likely team would win in every matchup would still be 1 in a couple million. This table of probabilities will probably favor more than a couple losing teams and you are smart enough to pick some of those games. Perhaps a team is favored but their best player has a nagging injury and has under-performed the last few games, or perhaps a team isn’t favored but has just been gifted with a tournament location 20 miles from campus giving them pseudo-home court advantage. Another thing to think about is the value of predicting upsets, which in many bracket contests are rewarded with bonus points. It could be a smart idea to bet on a 12 seed that is given .4 probability because the expected return is higher than the safer five seed (compute .4 x reward for picking upset versus .6 x reward for picking winning high seed). There’s a lot to think about and the optimal way to think of this table is as a tool, not an authority.

What about the play-in games?

The March Madness Tournament actually begins with 60 teams set and four spots to be filled by the winners of four play-in games. This aspect was very difficult to build into my prediction table because the four play-in winners are not slotted into the same places in the bracket every year (sometimes more than one are put into the same region and none into another region). Thus I wrote my program to handle a 64-team single elimination tournament and predicted the play in games separately, using the same logistic regression based algorithm. The predicted winners of the play in games are among the teams included in table.

EEG and Eye Tracking: My Summer in the Compton Lab

This summer I worked in Rebecca Compton’s Cognitive Neuroscience lab, studying the effects mind wandering, ERNs (error related negativity), and error related alpha suppression. A majority of the summer was spent testing out and preparing the lab’s new Eye-tracking system, Tobii, and working with Curry 7–new EEG software. After learning the two new programs, the other RAs and myself began running participants for Becky’s grant proposal.

In Study 1a, we examined the differences in pupil diameter after correct and incorrect responses. Using Eprime and Tobii Eye-tracking software, we designed a Stroop task–a word color task where participants must press a key indicating the color of the word, not the meaning of the word–to analyze correct and incorrect responses. The task consisted of 6 blocks of 72 trials each. Participants responded with a ~93% overall rate of accuracy. In this study, we found a significant main effect of period, F(2,18) = 27.5, p < .001, indicating that pupil diameter was greatest following the response button press. We also found an interaction effect of trial type by period, F(2,18) = 7.5, p <.005, indicating that pupil diameter was significantly greater for errors compared to correct trials during the post-response period. This study replicated prior findings of error related pupil dilation.

In Study 1b, we combined Eye-tracking and EEG methods to simultaneously examine pupil diameter and EEG oscillations following correct and incorrect responses. Similarly to Study 1a, we found that pupil diameter was significantly greater for error vs. correct trials during the post-response period. There was a main effect of period, F(2,18) = 5.5, p <.02, and an interaction effect of trial by period, F(2,18) = 6.6, p < .008. Further, we found that there was more alpha related suppression following error trials compared to correct trails, F(1,9) = 11.6, p < ,01. These findings replicated Carp & Compton (2009)’s prior findings that there is great alpha suppression following error than correct trials.

Following Study 1a and 1b, this year we will be running participants for part 1c. We hope to replicate these findings with a larger sample size and to examine between and within-subjects correlations between error-related pupillary and EEG effects.

Thanks to Becky, Liz, Steph, and all of the Psych department and KINSC this summer for your support on our work!

Photoelasticity technique for studying a granular system — it reveals the “force chains”

 

Week 4 in the Harvey Lab- Calcium Confirmation

Xenic Results

By “calcium confirmation”, I mean that we have determined intracellular calcium is NOT involved in our algicidal compound’s mechanism of action.  Sometimes that’s how it goes in science, especially in a field where so little is known; you have to weed through many negative results to find the positive hits.

This is the case with my phytoplankton bioassays. Each new crude extract has the potential to contain an algicidal compound, but many crudes are not active against the phytos or even enhance phyto growth (which is cool too!).

The element of chance in my work is one of my favorite aspects. When the crudes are spun down, they look pretty much the same. But when cell counts for a particular crude come back 10 times lower than they started in an experiment, I think to myself, “Wow, whoa, what makes this one so special?” Another exciting aspect is the fact that we have the technology to find out exactly what compound makes them “special,” and then we can go a step further and determine exactly why they function in this “special” way in the ocean.

My campaign to elucidate this mechanism of action continues next week, as I test the cells for reactive oxygen species.

Week 3 in the Harvey Lab- Phytos go M.I.A. during a P.I.A.

Back in Georgia, Week 3 was off to a great start. The phytoplankton culture looked strong, there were fresh crudes to test in my phytoplankton inhibition assay (PIA) like the one above, and we had a solid game of pickup soccer among the lab members.

But by Wednesday, the abundance of phytoplankton in my PIA was dropping across the board. This is not the type of cell death I am looking for, because even the phytos who hadn’t been treated with possibly algicidal crude extracts were disappearing.

Then I remembered what a helpful lab technician had told me about my phyto culture.

—-“They typically don’t like this kind of bottle, and the cap was on a little too tight,” she said as she showed me her beakers of phytoplankton strains, which were covered in loose-fitting aluminum foil.—-

Even though they appeared to be doing OK, I was suffocating my phytoplankton!! Just like land plants, my phytos can’t grow without CO2, and these stressed out cells were not reproducing as they normally do, which is about doubling daily.

The only way to rescue them was to…well, I couldn’t really rescue them because the experiment was flawed…so I poured the remaining ones into a container of 10% bleach labeled “Unhappy Phytos” and restarted the experiment with a new strain borrowed from the technician.

The experiment is now back on track, and the only consequence is having to come into the lab a few times on the weekend. Luckily my barracks is only 200 yards from the lab.

Week 2 in the Harvey Lab- Dilutions off Bermuda

It’s 2:45 am. I’m 80 kilometers off the coast of Bermuda. I’ve been awake for close to 20 hours. I’ve eaten at least 10 cookies and muffins from the galley. And I’ve watched 180 minutes of Season 6 of Game of Thrones, a show I’ve never seen before so I have no idea what’s going on, to pass the nighttime before 3 am.

“It’s up!” I hear from down the main passageway. Now’s my time to shine.

I trudge out on the deck, zip up a lifevest, twist on a hard hat, and grab some rope. My first job is to retrieve the ship’s yellow-framed CTD, a pretty darn cool instrument that measures conductivity, temperature and depth of the ocean water.

“Hold the rope farther from the cleat,” the marine technician on the other rope warns me. “They look innocent, but I knew a guy who lost a hand doin’ what you’re doin’.” Enough said; I took three healthy steps back from the cleat.

We lowered the CTD down snug onto it’s landing pad and I started filtering water from its ring of grey Niskin bottles. Dr. Harvey’s experiments need someone to take saltwater samples every six hours, so they can be diluted and measured for grazing of phytoplankton. Since I’m more of a night person than a morning person at sea, I have the 3 am shift, and Dr. Harvey has the 9 am.

It was a whirlwind research cruise. Not a lot of sleep, but a lot of good samples and a lot of fun! The rest I’ll leave to the pictures:

There was safety (first as always)…

There were sunsets…

There were colleages/friends/jokesters…

There was a sloth…

And at the end of it all, the were two delicious chickens (not pictured) inside a pig (pictured)…

Thank you to the crew of the R/V Atlantic Explorer!

Week 1 in the Harvey Lab- On and Over the Atlantic

My first week stationed at the Skidaway Institute of Oceaniography (SkIO), affiliated with the University of Georgia, was an exciting one!

Nothing beats waking up and running some fresh new experiments on phytoplankton. Within just three days of testing, I determined which fractions of my crude bacterial extracts were algicidal, because the phytoplankton populations plummeted when treated with these compounds.

This new intel was sent straight back to Haverford, where more fractionation will help us crack the phytoplankton-killing code.

In the meantime, my SkIO advisor, Dr. Liz Harvey, and I prepared for a four-day cruise in the North Atlantic off the coast of Bermuda! Dr. Harvey has been wanting to study the night-day differences, or “diel” variation, in rates of phytoplankton grazing, that is, other microorganisms eating phytoplankton. And I have the privilege of helping her on the voyage, which will take us to the Bermuda Atlantic Time Series (BATS)!

BUT FIRST, SAFETY…

I volunteered to model proper immersion-suit entry. Immersion-suits are often called “Gumby Suits.” Can you see why?

We arrived at the Bermuda Institute of Ocean Sciences on Thursday, met with our colleagues from Oregon, Virginia, California, and Bermuda, and gathered materials like deionized water and 10-liter carboys for taking samples.

When the ship, the R/V Atlantic Explorer, is fully loaded and the tide is right, we will set sail on Saturday!