Haverford to Harvard!

This summer, I participated in the Summer Program in Epidemiology at the Harvard T.H. Chan School of Public Health. I sought out this opportunity to further explore some of the concepts that have been touched on in my health studies courses that have geared me towards an interest in a public health career. Since Haverford does not have a public health program, I knew I would need to seek out some alternate options to learn more about this field. I applied to several programs and was accepted and chose to attend the program at Harvard! This program was 5-weeks long and during this time, we took classes, attended lectures from Harvard faculty, did GRE prep, and worked with a mentor to learn about research.


Being at Harvard was an amazing experience, it took me a while to realize it, but I’ve read work by many of the faculty and even had the opportunity to meet some of them! It was great to network with all these incredible professors doing such interesting and important work in public health. However, one of the most rewarding experiences of this program was the opportunity to network with and meet current graduate students. As someone who does not know anyone who went through the process of getting a Ph.D. or is going through it now, this was especially useful in helping me learn what life is like as a graduate student. It is one thing to sit down and hear admissions talks and lectures. But to meet people who are currently going through it really gives you insight into what life is like as a graduate student. And I was lucky enough to meet a current student who is a Swarthmore alumna. She quickly became a mentor for me and has helped guide me throughout my 5-weeks and beyond. 


My program consisted of eight students and our cohort became very close over the course of our time together. I think one of the biggest takeaways for me in all of the programs I participate in are the other students I meet. Not only are they some of my good friends, but they are also great people to stay in contact with. I think it is very important to have a network of peers who are interested in what you are interested in. You all can share opportunities with each other now, and once you are all working and in higher roles at your respective jobs, you have a network of people to collaborate with and learn from.

For the research component of the program, I chose to focus on an independent project titled: Mass Incarceration and Public Health: An Opportunity for Intervention. Building off some of the work I have done for final projects in my health studies classes, I decided that this would be an interesting area to apply some of my newfound epidemiological knowledge. The primary focus of the work was an intervention critique of some of the existing health system interventions in carceral care settings. My favorite part of my presentation poster was my ecological diagram! I made it myself from scratch and got several compliments during my poster presentation.


Doing independent research on a topic that I find interesting was extremely exciting work! It is not easy work but the fact that I stuck with it and found it so rewarding is a good sign for me. This solidified my interest in public health and I cannot wait to explore the field further.

If you have any questions or are interested in applying to this program (or learning more about public health) feel free to reach out to me! casamuels@haverford.edu.

IGF1R Expression in Canine Immune Cells


Imagine a great dane standing next to a Chihuahua. With the Great Dane weighing almost twenty times as much as the Chihuahua, it’s impressive that these two animals are members of the same species, Canis familiaris. In addition to size differences, there is an inverse relationship between breed size and longevity.1 The genetic basis behind the wide variation in sizes has been a prominent subject of research, and in recent years, several specific regions in the canine genome have been found to influence size.2 In 2008, a genetic analysis of large and small breed dogs discovered a difference in the promoter region of the gene for a small protein, insulin-like growth factor 1 (IGF1), which is produced in the liver and circulates throughout the body. The promoter region of a gene helps regulate when the gene is transcribed into a protein. The change in the promoter region of IGF1 means less of the protein is produced and consequently, smaller breeds have lower levels of IGF1 in their blood.3 Cells with a receptor protein, insulin-like growth factor 1 receptor (IGF1R), in their membranes can respond to IGF1. When IGF1 binds to IGF1R, a cascade of events occurs within the cell leading to cell growth, proliferation, and differentiation. The interaction of IGF1 and IGF1R is especially important for the growth and development of an organism.4 As IGF1 has been studied more, it has become apparent that IGF1 plays a role beyond growth, specifically in anti-inflammatory immune pathways.5 The goal of my summer research is to investigate if the canine immune system responds to IGF1 with the hypothesis that IGF1 could play a role in health differences between breeds.


This summer, I have been working in the lab of Dr. Jennifer Punt and Dr. John Wagner at the University of Pennsylvania. Jenni is currently Dean of One Health at the University of Pennsylvania School of Veterinary Medicine, as well as Haverford’s Pre-Vet advisor. Additionally, Jenni was previously a biology professor at Haverford. As an aspiring student of veterinary medicine, I’ve been in contact with Jenni since I arrived on Haverford’s campus. Interested in gaining an experience with a veterinarian different from those I’ve had with general clinical practice, I asked Jenni if I could work in her lab for the summer, and she said yes! I then applied for, and received, the KINSC Summer Scholar’s Scholarship to fund my research. In the lab, I’m working with three Penn undergrads, a second year Penn Vet student, and two high school students. It’s been a great experience involving teamwork, communication, and research.


The majority of my work has been focused on using a technique called flow cytometry to investigate our hypothesis that there are canine immune cells that can respond to IGF1. The easiest way to tell if a cell can respond to IGF1 is to see if the cell has the receptor, IGF1R. Luckily, there is a well-established technique for identifying proteins, like IGF1R, on cells. It involves capitalizing on another type of protein that is naturally produced as part of the immune system – an antibody. Antibodies have two regions to them – an Fc region and a Fab region. The Fc region is constant between many different antibodies, whereas the Fab region is variable and binds to a specific target called an antigen. This is a property of antibodies taken advantage of by scientists. Essentially, we can manufacture antibodies that bind specifically to any protein of interest. Since cells and antibodies are miniscule and cannot be readily visualized, antibodies of interest can be made such that they are conjugated with a small molecule that fluoresces: a fluorophore.

This is where flow cytometry comes in. A flow cytometer uses fluidics and optics to analyze cells one-by-one. Through a system of lasers, the machine can determine the volume of the cell (forward scatter), the internal complexity of the cell (side scatter), and whatever fluorophores are attached to its surface. By incubating immune cells with various fluorophore-conjugated antibodies, we can begin to characterize the cells that can respond to IGF1.

So where do we find these immune cells? The immune system is partly composed of white blood cells, found, as expected, in the blood (among other places). The diagram below illustrates the basic pathway for how blood cells, including white blood cells, are made.

Modified from: opentextbc.ca/anatomyandphysiology/wp-content/uploads/sites/142/2016/03/2204_The_Hematopoietic_System_of_the_Bone_Marrow_new.jpg%5B/caption%5D

For our project, we received canine blood samples from Penn Vet’s Veterinary Clinical Investigations Center (VCIC). Since the blood contains more than just white blood cells, we had to get rid of some of the extra material. One way to do this is to destroy, or lyse, the red blood cells with ammonium chloride. We can also spin the blood through a density gradient. The hypodense layer (above the gradient) is called the peripheral blood mononuclear cell (PBMC) fraction and contains mostly monocytes and lymphocytes (T and B cells).

As I mentioned previously, existing research indicates a connection between IGF1 and anti-inflammatory pathways, specifically pathways that suppress the immune system. Within many types of immune cells, there have been subsets identified whose main functions appear to be suppressive. The ones my project has focused on are polymorphonuclear myeloid derived suppressor cells (PMN-MDSCs), a subset of neutrophils; monocytic myeloid derived suppressor cells (M-MDSCs), a subset of monocytes; and T regulatory cells (T regs), a subset of T cells. Luckily, all three of these cell types are found in the PBMC fraction of blood.

To determine which, if any, of these might respond to IGF1, we stained them for characteristic surface markers in addition to IGF1R. The list of antibodies we used is below. Due to the limitations of our flow machine, we were only able to stain with four at a time, but we were innovative in the combinations we used.

Work done in the lab last summer yielded some promising results that suggested the vast majority of  PMN-MDSCs might express IGF1R. PMN-MDSCs are generally defined as being low density (PBMC) fraction and CADO48A+ (an antigen present on canine neutrophils.)

This figure, made by Trevor Esilu ‘21 who worked in the lab last summer, looks at the CADO48A+ population, labeled as neutrophils, and highlights their expression of IGF1R (gray) versus the whole sample (black). In whole blood, only about 2-3% of neutrophils express IGF1R. In contrast, 87% of the CADO48A+ cells (which are PMN-MDSCs) in the low density fraction express IGF1R. This result was very exciting!

At the beginning of this summer, we conducted an experiment that is a basic control for any antibody staining. Many cell types, especially neutrophils, have receptors on their surface that can bind to the Fc region of an antibody. If this happens, the cell will still be “positive” according to the flow cytometry, even though the antibody has not actually bound to the correct target.

In order to prevent this from happening, the sample has to be flooded with generic antibodies that do not fluoresce. This can be done by adding serum to the sample. Serum is blood without any cells, but it contains a wide variety of particles including hormones, lipids, cholesterol, sugars, proteins, and most important, antibodies. Generally, whole blood samples already contain serum, but PBMCs, due to the nature of their isolation, do not. One of the first things I did was conduct an experiment to see if the addition of serum changes the expression profile of IGF1R. The results from that experiment are below.

In both samples, whole blood and PBMC, without serum there is significant expression of IGF1R. Once the serum is added, however, this positive reading almost disappears. This was a disappointing way to start out my project, but we recognized that there was still some expression of IGF1R, we just still had to determine exactly where.

We still had some more suppressor populations to investigate. We decided to move onto T regs. To investigate if T regs expressed IGF1R, we stained with for IGF1R along with three characteristics of T regs: CD4, CD5, and Foxp3. Some of those results are below.

These graphs show the IGF1R expression in CD5+ CD4+ cells that are either Foxp3+ (left) or Foxp3– (right). As you can see, there is a population of the T regs (CD5+ CD4+ Foxp3+) cells that do express IGF1R. Additionally, the proportion of IGF1R+ cells is higher in the Foxp3+ cells than the Foxp3– cells. While these results clearly show expression of IGF1R in some T regs, some of our other samples indicated no expression of IGF1R in T regs. While the lack of reproducibility is disappointing, these results are very interesting and provide direction for further investigation.

While my part in the project will likely end this summer, our work will be continued by a Superlab class at Penn in the fall. Hopefully they can use our results to continue investigating IGF1R expression in T regs by testing more samples and varying ages, sizes, and disease states. Other future directions of the project include investigating potential IGF1R expression in M-MDSCs and possible stimuli that can induce IGF1R expression. I look forward to seeing which directions the project takes!

– Johanna ’21


(1) Galis, F., Sluijs, I. V. D., Dooren, T. J. M. V., Metz, J. A. J., and Nussbaumer, M. (2007, March 15) Do large dogs die young? J. Exp. Zoolog. B Mol. Dev. Evol.

(2) Boyko, A. R., Quignon, P., Li, L., Schoenebeck, J. J., Degenhardt, J. D., Lohmueller, K. E., Zhao, K., Brisbin, A., Parker, H. G., vonHoldt, B. M., Cargill, M., Auton, A., Reynolds, A., Elkahloun, A. G., Castelhano, M., Mosher, D. S., Sutter, N. B., Johnson, G. S., Novembre, J., Hubisz, M. J., Siepel, A., Wayne, R. K., Bustamante, C. D., and Ostrander, E. A. (2010) A Simple Genetic Architecture Underlies Morphological Variation in Dogs. PLOS Biol. 8, e1000451.

(3) Sutter, N. B., Bustamante, C. D., Chase, K., Gray, M. M., Zhao, K., Zhu, L., Padhukasahasram, B., Karlins, E., Davis, S., Jones, P. G., Quignon, P., Johnson, G. S., Parker, H. G., Fretwell, N., Mosher, D. S., Lawler, D. F., Satyaraj, E., Nordborg, M., Lark, K. G., Wayne, R. K., and Ostrander, E. A. (2007) A Single IGF1 Allele Is a Major Determinant of Small Size in Dogs. Science 316, 112–115.

(4) Laron, Z. (2001) Insulin-like growth factor 1 (IGF-1): a growth hormone. Mol. Pathol. 54, 311–316.

(5) Smith, T. J. (2010) Insulin-Like Growth Factor-I Regulation of Immune Function: A Potential Therapeutic Target in Autoimmune Diseases? Pharmacol. Rev. 62, 199–236.

The Manifestation of Colorectal Cancers and How to Proactively Protect Yourself from the Third Most Common Form of Cancer.

Colorectal cancers have been widely studied due to their massive impact on Americans. In 2019, estimates from the American Cancer Society expect slightly over 100,000 new cases of colon cancer and 44,000 new cases of rectal cancer to be diagnosed. Additionally, colorectal cancer is expected to cause around 51,000 deaths in the U.S. in 2019 alone. Although these statistics are quite powerful, the number of cases and deaths have been decreasing as further scientific understanding of the mechanisms of these cancers is elucidated.

This summer, I am working in the Kalady Lab at the Lerner Research Institute of the Cleveland Clinic. My lab specializes in providing insight into the genetic underpinnings of colorectal cancers and applying this knowledge to the clinical realm in order to treat patients. I will be reporting on my project further along in the summer, but I first wanted to start with a blog post explaining how colon cancer manifests and how the worst aspects of this illness are often preventable.

Understanding tumor formation requires understanding the cellular mechanisms that tumor cells hijack. Tumors cells are abnormal cells that have somehow beaten the system and have acquired the capacity to uncontrollably divide. The methods in which tumor cells diversify or transform often confer a selective advantage in comparison to regular cells, indicating why tumors can grow at such a fast and uncontrollable pace. For colorectal cancers, there have been two major pathways established for how tumors can develop. These separate pathways include modifications to genetic, epigenetic, and DNA mismatch-repair (MMR) systems that are associated with cell growth, differentiation, motility, and survival. For the purpose of this blog post, I am going to define the genetic and epigenetic roots of tumor formation in addition to a shorter explanation of MMR.

The underlying genetics involved in cancer have been studied for decades, leading to an understanding that certain mutations that can lead to benign and malignant tumors. The Chromosomal Instability Pathway, described by Bert Vogelstein in 1988-1990, establishes that a certain number of genetic mutations in specific genes are correlated to different stages of tumor development.

As seen in the figure, three genes are often referred to within the Vogelstein pathway as genes or processes that must be mutated or upset in order to undergo tumor formation. All of the mutations in the Vogelstein Pathway are additive; there isn’t a necessary order to the mutation process. Because tumor formation requires all of these mutations but not in any order, tumors are much more likely to win out as random mutation is a byproduct of the imperfection of nature.

The APC gene, or adenomatous polyposis coli gene, codes for a regulatory protein in the Wnt pathway. When a mutation occurs in APC, its protein’s ability to interact with and bind to β-catenin is ceased. β-catenin is a signaling molecule that can call for the upregulation of genes associated with proliferation. Therefore, if APC is mutated and loses functionality, proliferative genes can be more highly expressed, leading to one of the hallmarks of cancer. The next gene mentioned in this figure is Kirsten-ras or KRAS, which codes for a cell-signaling protein that works within the RAS/ERK pathway of signaling. When RAS proteins are phosphorylated, they can pass their phosphate to the next protein in the pathway: BRAF. Eventually, the pathway involves a protein called ERK that has the potential to upregulate more genes associated with proliferation and survival. Mutations in either KRAS or BRAF have been clinically observed in early adenoma formation, a benign growth that needs only one more mutation to become cancerous. The final gene that provides a barrier against tumor formation is the p53 gene. p53 is a transcription factor that assists in the control of the cell cycle. This transcription factor will bind to DNA and can downregulate genes associated with survival and proliferation; however, if p53 is mutated, this function is lost. p53 is somewhat of a last resort in tumor prevention, although the exact reasoning as to why has not yet been fully elucidated.

The second pathway to colon cancer development was established in a study by Toyota et al. in 1999. This pathway arises from the CpG island methylator phenotype (CIMP), which is an observed phenotype of epigenetic silencing of certain DNA repair and cell maintenance genes through promoter methylation. The physical process that occurs includes the sequestering of a protein complex to certain promoter locations of DNA. At the promoter, methyl groups are added by the complex to cytosines that share phosphodiester bonds to guanine nucleotides. When CpG’s form in groups, this clustering of methylation is deemed an island and can lead to the inability of other transcription factors successfully interacting with the promoter. Ultimately, methylation leads to downregulation of transcription of the following gene, thus cutting off any cellular outcome attached to the target gene.

The identification of CIMP has been made only in relation to genes associated with regulatory processes within the cell, which, when hijacked, can lead to tumor formation. Most of the genome is actually methylated at any given time; proteins must often demethylate DNA via nucleotide excision repair or mismatch repair in order to express the following gene. Therefore, when studying CIMP, it is critical to have a thorough and accurate process in how to delineate between CIMP+, CIMP-, and CIMP0. This delineation has been established differently between studies, but the most common identification technique of CIMP-status is a five-panel marker established by Weisenberger et al. in 2006. This panel includes genes that are all somehow associated with CIMP and that when methylated, resulting tumors show symptoms correlated to CIMP+.

The distinction of why CIMP-status becomes important is in the type of tumor formed by CIMP+ versus CIMP- status. CIMP+ tumors are often less differentiated and are more aggressive tumors. Prognostically, data differ between whether CIMP+ or CIMP- possess more clinically favorable outcomes, though. These differences arise from the multiple panels used to analyze CIMP status. Regardless of which panel is used, a certain gene correlated to MMR is always addressed: MLH1.

MLH1, mutL homolog 1, is a protein that assists in fixing errors in DNA replication prior to cell division. MLH1 complexes with PMS2 to cut out erroneous nucleotides and properly replace the necessary nucleotide as replication occurs. MLH1 mutation causes the gene to lose its ability to regulate DNA replication through mismatch repair. Without MLH1 functionality, mutation rates drastically increase causing DNA hypermutability, designated as microsatellite instable, or MSI-H. The outcome of microsatellite instability is that if a tumor is formed, the tumor genome is unstable as MLH1 cannot spell-check its replication. Conversely, without an MLH1 mutation, a tumor is designated as microsatellite stable, or MSS. This distinction provides a more controlled environment for tumor growth and is disadvantageous prognostically.

Throughout this background on the formation of cancer, it is still crucial to acknowledge that although it may be interesting to study the causes of this disease and where the body is prone to failure, individual people still suffer from the outcomes of these failures. The only reason that we have access to such a great deal of information about colorectal cancers is because thousands of patients have been willing to undergo additional testing or provide samples of their tumors during surgery. Ideally, more effective screening should reduce the number of patients who have to experience the physical and mental ramifications of colorectal cancer. Currently, the American Cancer Society recommends that people with average risk for colorectal cancer should begin screening at age 45.

Major issues exist in making this screening accessible to all peoples, but if the option exists to receive the screen, there is no reason to not immediately schedule testing. Colorectal cancer is one of the only cancers that can be controlled or prevented simply by adequate screening, and for most, there is no reason why this illness should be a risk. Therefore, please advocate for screening or sign up for your own screening soon! This can easily be a life-changing or even life-saving decision.

Summer 2018: Researching ALS across Different Model Organisms

By Sophia Nelson

One of the world’s leading neurodegenerative diseases, Amyotrophic Lateral Sclerosis (ALS), causes its destruction of the nervous system in largely unknown ways. Recent research has shown that a hexanucleotide repeat, GGGGCC in intron one of c9orf72, is the most frequent genetic cause of ALS, with the number of repeats in affected patients ranging anywhere from hundreds to thousands. While these repeat sequences are found in the non-coding region of the gene, they still appear to contribute to toxicity through repeat associated non-ATG translation, which can begin at any point without the presence of the start codon, ATG. This unconventional translation allows repeats to be translated into five different dipeptide repeat proteins (DPRs): GA, GP, GR, AP, and PR. One of these proteins, GA, forms paranuclear amyloidogenic aggregates which have been found in the brains of human ALS patients. However, the direct role of GA—or any DPR— in disease pathology and toxicity is not yet known, particularly because the toxicity of GA varies heavily based on the model system in use.

As a Velay Scholar this past summer, I got the chance to work with professors Robert Fairman and Roshan Jain to investigate the protein GA through a comparative study characterizing the aggregation and toxicity of GA in three model systems: worms (C. elegans), zebrafish (D. rerio), and fruit flies (D. melanogaster). The bulk of my work was in worms and flies, as my fish have not yet grown large enough for testing! I expressed GA in the neurons— ALS is known to attack motor neurons, so neuronal expression is important to study— in each of the two model systems and then performed behavioral and confocal imaging analysis for comparison. In the imaging studies, I was looking for multiple things: firstly, the large, paranuclear puncta that are hallmark of GA aggregation; and second, the localization pattern within each organism. Behaviorally, I was attempting to understand whether or not GA expression within neurons was toxic by comparing organism performance in simple behavioral assays both with and without GA. For worms, this behavior was thrashing; for flies, it was the ability of larvae to crawl. I learned so much throughout the summer! I dissected fruit flies (both larvae and adults) and removed and imaged their brains, which are barely the size of a poppyseed. I learned behavioral testing across all three organisms, as well as PCR genotyping, staining and imaging techniques, confocal microscopy, and the beginning stages of biochemical assays as well, such as lysate preparation.

The dissection of fruit fly larvae and brains showed that GA was heavily concentrated within their developing brains, particularly in the progenitor of the neural column and the neuronal ganglia that develop and associate with the eyes. The confocal images of my fly brains were likely the coolest result of my whole summer! In worms, GA puncta is found throughout the body; heavily concentrated in the brain and along both the ventral and dorsal cord. Behavioral assays indicated that the presence of GA had a negative effect on C. elegans thrashing. Larval crawling data was collected and differences between controls and GA positive larvae were found, but this data is still being followed up on to conclusively determine any effects. Biochemical results, obtained through SDD-AGE, will be gathered this spring. Overall, this study indicates that the form of GA aggregation is consistent across species despite slight differences in localization, and the presence of GA appears to have a negative effect on behavior, indicating it may have a role in disease toxicity and should be tested further.

Performing these experiments was an incredible experience! My mentors were both amazing and taught me so much. It was my first time working in a research lab, and as a biology major with minors in neuroscience minor and health studies, it has only confirmed my interests further. After graduation, I now plan on going to medical school and graduate school for an M.D.-Ph.D so that I can continue to perform research while also gain a clinical perspective and directly help patients. I am so excited to be able to apply the knowledge I gained this summer (and will continue to gain in my last year at Haverford) in my future career and work with a topic that has the potential to have a direct impact on many people’s lives.

Dissected and removed fruit fly larval brain neuronally expressing the protein GA. The middle section is the progenitor of the neural column.

Summer 2018: Mitigating ACEs at Vanderbilt Medical Center

“It’s easier to build strong children than to repair broken men.” – Frederick Douglass

Adverse childhood experiences (ACEs) come in many shapes and forms, including neglect, abuse, and household dysfunction. But how influential are they in a child’s health outcomes? Research has repeatedly shown that ACEs can significantly affect brain health enough to contribute to cognitive impairment, risky behavior in adult life, and long term risks of disease and mental illness. Therefore, we move onto the next question: How do we mitigate ACEs? That’s when I come in.

ACEs can range from being parenting related, to environmental.

This summer, I’m working alongside Dr. Seth Scholer, a pediatrician at Vanderbilt University Medical Center Children’s Hospital. Dr. Scholer has spent over a decade conducting research regarding ACEs, and how to successfully assess and alleviate them through pediatric primary care. With funding from the state of Tennessee, my research this summer has mostly focused on a randomized control trial (RCT) in which we hope to demonstrate that a brief parenting intervention can reduce unhealthy parenting tactics, thus nurture brain health in the clinic’s patients.

The utilization of an ACEs Screening Tool can improve health outcomes of children by identifying and addressing ACEs early in life.

My personal research project this summer is definitely simpler than an RCT, but has its own challenges. All previous research utilizing ACEs screening tools have taken place in pediatric clinics associated with research institutions such as Vanderbilt. However, the next step from here is employing a screening tool state-wide, which requires additional research that addresses how to implement the screening tool in private medical practices.

Therefore, I have been implementing an ACEs Algorithm and screening tool at a private pediatric primary care clinic for my summer research project. The screening tool is a quick survey that measures a child’s household/environmental stressors, and the degree to which their parent(s) use healthy discipline strategies. The ACEs Algorithm helps health-care providers interpret their patients’ scores, and points out when children are at low-high risk of ACEs. This is the first research study of its kind, and it requires working hands-on with the doctors and nurses at the private clinic to maximize the efficiency and effectiveness of the screening tool. Overall, this project has been a great opportunity to work along the front lines of ACEs research.

Health care providers use this ACEs Algorithm to interpret a child’s parenting-related ACEs and environmental ACEs (or other childhood stressors), after their caregiver completes a short ACEs Screening Survey. I worked with Dr. Scholer on the development of this algorithm throughout the summer, and this image is our final result.

As a Psychology major with minors in Neuroscience and Health Studies, this research experience perfectly fits the little niche formed from the intersection of my three fields of study. A typical day for me involves lots of patient/provider interaction and data management, with some manuscript and literature review writing stuck in between. This has helped me build concrete clinical research skills that are hard to learn in a classroom. Furthermore, I’m ecstatic about my ability to work within a research topic that is having a direct impact on people’s lives.


a nightmare about an important species which fails a quality control criterion


B. rapa is an important autopolyploid plant species in my research. I had spent a huge portion of my research time on selecting autopolyploids from polyploid plant species, while eliminating allopolyploid ones. During the height of my literature search, I had woken up to such a nightmare, which could potentially spoil all my previous effort. To my reassurance, the dream was only a dream…


art: June 18, 2018; text: July 5, 2018

How I got in Whalen’s Lab

As another internship season approaches, many friends are asking me how I got my summer 2017 biology research position without having taken any biology at Haverford. [1]

The process was not straightforward. I set out considering a major at Bryn Mawr, then halfway switched to Swarthmore, and ended up determined, “If biology, then Haverford.” Last summer was to get a taste of biology at Haverford.

Once the decision was made, I immediately reached out to Professor Whalen, whom I had chatted with at academic tea, a casual gathering of all departmental representatives at the beginning of every semester, to answer students’ questions. I had also browsed professors’ Haverford webpages, where their CVs and research directions are listed. Although my tentative major was biology, I could not understand the content of any project our biology professors listed. Still, marine science, drug resistance, and a photo of Professor Whalen smiling from a solid blue ocean-sky background attracted me above all.

I sent out my first email on Dec. 22, 2016, and went to play in New Haven. When I came back, a reply had lain in my mailbox since the day I left — what a quick response! To day, Professor Whalen’s efficiency is still surprising, motivating, and scaring me from time to time. Back then, I immediately arranged to meet with my first-ever mentor.

She showed me all her undertakings and some summer opportunities, and asked me to show up at lab meetings the coming semester. Since I could only work on-campus as an international student, we decided to begin with the “bacterial response to a chemical” project ongoing at her lab. That was it! I became part of Whalen’s lab. When the summer scholarship application season came, she instructed me to apply (Kovaric Fellowship [2]). When I failed, she applied funding from Provost for me, so I could get paid. Everything was settled as early as Mar.18, 2017, after which I just sat back and pictured the richness of the coming summer.


Takeaways for new applicants:

  1. Start collecting information early; browse professor/institution’s webpages, and from there find out more
  2. After narrowing down your choices, reach out (sometimes it takes longer to receive reply; don’t feel discouraged or overly anxious)
  3. Academic tea is a great space to ask any lay (or expert) questions about subject/ courses/ major/ internship/ …; professors are there for you
  4. Don’t be afraid if professor’s research seems hard to understand!


[1] only sophomores and above could take biology courses at Haverford

[2] funding opportunities please see: www.haverford.edu/integrated-natural-sciences-center/programs-funding/student-research-funding

Summer Research Report: Predicting March Madness

Last summer I received funding from a Velay Fellowship to do sports analytics research at Davidson College in North Carolina and my main project was developing an algorithm to predict outcomes in the NCAA Division I March Madness Men’s Basketball Tournament. I finished my project in August and was able to backtest five years and demonstrate that my algorithm would have outperformed other methods (including fivethirtyeight,  Power Rank, and numberFire), but I knew the first real test would come now, in March, as my algorithm first tackles a tournament field in real time. Below is the table of predicted outcomes produced and some further explanation and insight into what I’ve learned.

What am I looking at?

Each row of this table represents one of 64 teams in the 2018 March Madness Tournament. Choose a team in the first column and move to the second column to see the probability that that team will appear in the second round or, equivalently, the chance they will win their first matchup. The probabilities range from 0 to 1 with, for example, .54 indicating a 54% chance of the team successfully making it to the second round. The following columns will give the chance that the team appears in each of the subsequent rounds of the tournament, and the final column gives the probability that the team will be named champions. The chance any given team will appear in the second round (Round of 32) is greater than the chance they will appear in the third round (Sweet Sixteen) which is in turn greater than the chance they will appear in the fourth round (Elite Eight) and so on. According to this table, Virginia has the highest chance (99.2 percent) of winning their first game and making it to the second round and a 28.66 percent chance of winning the whole tournament.

What exactly determined these probabilities?

I pulled data on kenpom.com and masseyratings.com and used JMP statistical software to look for correlations between over 50 team and player stats and game outcomes. Sometimes there is a clear correlation that can easily be modeled by a linear regression: for example, points per possession is strongly correlated to winning games. Other statistics are more complicated: for example, player experience doesn’t always strongly correlate to success. The best teams are often either heavy with “one-and-done” freshmen or loaded with experienced upperclassmen. JMP tools showed that the relationship between player experience and success was best modeled by a quadratic, not linear, equation. It was also important to be careful of team statistics that were highly correlated with each other. For example, my team’s turnovers and my opponents’ steals essentially measure the same component of the game—including both statistics would run the risk of overweighting the importance that component.

I used a Python program to implement the logistic regression I designed to predict every possible tournament matchup. In later rounds of the tournament it is important to note that we are dealing with compounding probabilities. There are multiple possible opponents a team could face in a later round so the probability they will win is the summation of the probability they will beat each potential opponent multiplied by the chance they will make it to the game, multiplied by the chance that that opponent will appear in the game. The final results were formatted in Tableau to create the above table.

How is this model different than other prediction methods?

There have been many, many previous efforts to correlate team and player statistics with winning games and use to regressions to predict future games. For the most part the quality of the two teams (dictated by their season stats) does a good job of indicating who is likely to win, but sometimes teams with worse stats beat teams with better stats, and sometimes those results are predictable. Most sports analysts would call those games “bad matchups.” My go-to example is Villanova and Butler. Over the past four years, Villanova has maintained a stronger statistical profile and consistently placed well above Butler in rankings and polls, but dropped three games in a row to Butler in 2017. That type of result inspired me to look for correlations between two teams’ stat differentials and result. Instead of a regression that predicts ‘how likely is a team this good to beat a team that good?’ I wanted a regression that looked at ‘how likely is this team to win against a team that’s this much better than them at shooting free throws and this much worse at causing turnovers?’ If another team popped up in the tournament that was clearly statistically inferior to Villanova but was strong in the same categories as Butler, my algorithm would have a better chance of picking up that potential upset.

Why publish probabilities and not just predict winners?

College basketball is inherently unpredictable, but analysts have shown both success and improvement. There certainly are methods that provide vast advantages over a 50-50 coin toss and some prediction algorithms have demonstrated upwards of 70 percent accuracy over tens of thousands of games. With better data and methods, accuracy has and likely will continue to improve, but the consensus is that the cap is well below perfection. There will never be a way to fully account for the freak accidents, the emotions, the technical failures, and other uncountables that can affect the outcome of a game. I’m personally inclined to believe that college basketball is no more than 80 percent predictable. A list of only predicted winners undoubtedly contains incorrect results and there would be no way to help you identify which those might be. Publishing a list of probabilities gives you an idea of which games are more competitive and likely to go either way.

How do I turn this information into a bracket?

The simplest way to translate this information into picks for your bracket would be to advance all the teams on your bracket with probability greater than .50 (fifty percent) of appearing in the second round, then advance the all the teams with greater than .50 probability of making the third round and so on.

Will this deliver a perfect bracket? Almost certainly not. Even if every probability was spot on (i.e. every team the algorithm gives a .25 probability of advancing actually has an exactly 25 percent chance) the chance that the more likely team would win in every matchup would still be 1 in a couple million. This table of probabilities will probably favor more than a couple losing teams and you are smart enough to pick some of those games. Perhaps a team is favored but their best player has a nagging injury and has under-performed the last few games, or perhaps a team isn’t favored but has just been gifted with a tournament location 20 miles from campus giving them pseudo-home court advantage. Another thing to think about is the value of predicting upsets, which in many bracket contests are rewarded with bonus points. It could be a smart idea to bet on a 12 seed that is given .4 probability because the expected return is higher than the safer five seed (compute .4 x reward for picking upset versus .6 x reward for picking winning high seed). There’s a lot to think about and the optimal way to think of this table is as a tool, not an authority.

What about the play-in games?

The March Madness Tournament actually begins with 60 teams set and four spots to be filled by the winners of four play-in games. This aspect was very difficult to build into my prediction table because the four play-in winners are not slotted into the same places in the bracket every year (sometimes more than one are put into the same region and none into another region). Thus I wrote my program to handle a 64-team single elimination tournament and predicted the play in games separately, using the same logistic regression based algorithm. The predicted winners of the play in games are among the teams included in table.

EEG and Eye Tracking: My Summer in the Compton Lab

This summer I worked in Rebecca Compton’s Cognitive Neuroscience lab, studying the effects mind wandering, ERNs (error related negativity), and error related alpha suppression. A majority of the summer was spent testing out and preparing the lab’s new Eye-tracking system, Tobii, and working with Curry 7–new EEG software. After learning the two new programs, the other RAs and myself began running participants for Becky’s grant proposal.

In Study 1a, we examined the differences in pupil diameter after correct and incorrect responses. Using Eprime and Tobii Eye-tracking software, we designed a Stroop task–a word color task where participants must press a key indicating the color of the word, not the meaning of the word–to analyze correct and incorrect responses. The task consisted of 6 blocks of 72 trials each. Participants responded with a ~93% overall rate of accuracy. In this study, we found a significant main effect of period, F(2,18) = 27.5, p < .001, indicating that pupil diameter was greatest following the response button press. We also found an interaction effect of trial type by period, F(2,18) = 7.5, p <.005, indicating that pupil diameter was significantly greater for errors compared to correct trials during the post-response period. This study replicated prior findings of error related pupil dilation.

In Study 1b, we combined Eye-tracking and EEG methods to simultaneously examine pupil diameter and EEG oscillations following correct and incorrect responses. Similarly to Study 1a, we found that pupil diameter was significantly greater for error vs. correct trials during the post-response period. There was a main effect of period, F(2,18) = 5.5, p <.02, and an interaction effect of trial by period, F(2,18) = 6.6, p < .008. Further, we found that there was more alpha related suppression following error trials compared to correct trails, F(1,9) = 11.6, p < ,01. These findings replicated Carp & Compton (2009)’s prior findings that there is great alpha suppression following error than correct trials.

Following Study 1a and 1b, this year we will be running participants for part 1c. We hope to replicate these findings with a larger sample size and to examine between and within-subjects correlations between error-related pupillary and EEG effects.

Thanks to Becky, Liz, Steph, and all of the Psych department and KINSC this summer for your support on our work!