Scott Cole

My personal website

Home Burritos of San Diego Resume Data Blog Blog

    2025

    Python web app for studying combo-word Chinese vocabulary using Streamlit

    I best remember new vocabulary in Chinese is by breaking up its characters into component words, so I made a game to do this. e.g. 半 (half) + 岛 (island) = 半岛 (peninsula)

    Most frequent Chinese characters appearing in metro station names

    When I arrive in a new Chinese city, one problem I have is not knowing how to read the names of the metro stations I need to go to. I scraped a few thousand station names from Wikipedia to familiarize myself with the most frequent characters.

    2024

    Historical analysis of FIRE strategies

    On r/FIRE, there are common questions about ‘Is $X enough to retire?’. I analyzed some historical inflation and SPY return data to get a sense of how often different retirement strategies succeed.

    2023

    Google Sheet template for net worth tracking

    A lot of nerds have spreadsheets to track the details of their financials. I, of course, am proud of the custom-made columns in my own.

    Black Mirror episode ranking analysis

    Several websites have ranked all of the episodes in Seasons 1-6 of Black Mirror. I aggregated them into a spreadsheet along with my own ratings, and produced some figures and tables, including aggregate rankings and the most underrated and overrated skits.

    I Think You Should Leave Skit Ranking: A systematic review and meta-analysis

    Multiple websites have ranked the best skits in “I Think You Should Leave.” I aggregated them into a spreadsheet along with my own ratings, and produced some figures and tables, including aggregate rankings and the most underrated and overrated skits.

    2022

    Fruit picking in the East Bay

    Many streets in Oakland and Berkeley are lined with fruit trees. I made a map of them and a flow chart to formally decide if it’s OK to pick the fruit from a tree.

    Hofstede’s 6 culture dimensions - Streamlit

    A recent Freakonomics episode described Hofstede’s framework of 6 culture dimensions. I made a web app to visualize the culture dimensions by country and compare them to a personal set of preferences.

    Metro door opening durations - Cross-city comparison

    Riding the metro in Mexico City, I immediately noticed how briefly their doors open at the stops. So I collected some data to compare this more quantitatively with the metro I take at home, BART.

    2021

    Misleading title of British Medical Journal article

    The title of a BMJ article indicates that the first dose of the Pfizer vaccine has 52% efficacy. I rant here about how the statistical analysis does not match the intuitive interpretation of efficacy after the first dose, and how this leads to inaccurate downstream citations.

    2020

    Income inequality in USA, visualized

    I had heard that “income inequality is getting worse,” but I never really had a quantified perspective of it. Therefore, I downloaded some data from the Census and visualized it here

    2019

    My personal data from 10 apps

    I requested, processed, analyzed, and visualized data from Spotify, Twitter, Amazon, Facebook, Apple, LinkedIn, Uber, Venmo, Bank of America, and Tinder.

    Analysis of 10,000+ fact checks on Politifact

    Politifact is a handy nonprofit organization that rates the truth value of political statements, mostly by American politicians. For this post, I scraped the results of their fact checking since their inception in 2007 and visualized some trends across time, space, and the political spectrum.

    2018

    Estimating the prevalence of code sharing in scientific research

    I scraped over 100,000 full-text articles from the Pub Med API to estimate how common code sharing is across different journals.

    Analysis of Insight Data Science Fellows

    Insight Data Science is a popular fellowship for PhDs going into data analytics. I wanted to get a better sense of where fellows came from and ended up, so I scraped some data from the Insight website and analyzed it.

    2017

    Delays of US domestic flights: trends and predictability

    Using data collected by the US Bureau of Transportation Statistics, we analyzed the relationships between basic properties of a flight (e.g. time of day, airline) and how much they were delayed. We also trained a classifier to predict if a flight would be delayed.

    Brain Oscillations and the importance of waveform shape

    We believe that the waveform shape of brain rhythms should be analyzed to extract more biological information from neural recordings.

    Free supercomputing for research: A tutorial on using Python on the Open Science Grid

    The Open Science Grid is a free supercomputing resource for academics. This step-by-step tutorial will allow any researcher to begin running their Python-based analysis using high-throughput computing for free.

    2016

    Poster popularity at SfN 2016: Comparing across states and countries

    I analyzed the geographic distribution of poster viewership for posters presented at the SfN 2016 annual meeting. Posters from some states (Minnesota) and countries (Netherlands) are more popular than others. But not significantly.

    Poster popularity at SfN 2016: Cognition and systems are hot. Development is not.

    At the annual neuroscience conference, I collected data to quantify the popularity of thousands of presented posters. As a first analysis, I related poster popularities to 8 of the major themes in neuroscience.

    Lucha Libre Taco Shop: Official burrito review

    Twenty-eight people applied burritology to asses their experiences eating burritos at the famous Lucha Libre Taco Shop in San Diego.

    Olympics 2016: Normalizing results by sport

    The United States is dominating in the Olympic medal count, but maybe that’s because of the disproportionate number of medals in swimming. What would the results look like if the number of medals was even for all sports?

    Which country is winning the 2016 Olympic games?: A Tableau Visualization

    Interactive visualization to set weights to each medal category to visualize performance across the globe. Playing around with data visualization in Tableau Public using the Rio Summer 2016 Olympic medals dataset.

    Extracting time series data from a published figure

    Rather than extreme zooming on small figure panels, using simple image processing, we can extract an estimate of signals plotted in papers.

    100 Burritos in San Diego: 10-dimensional rating system

    A group of San Diegans quantified over 100 burrito experiences by decomposing their meal into 10 dimensions. This post describes the data and has some preliminary analysis.

    Phase-amplitude coupling: hidden in noise

    Phase-amplitude coupling is a common analysis on neural oscillations. But in order to obtain meaningful results, we need to first preprocess the signal.

    Empirical Mode Decomposition (EMD) tutorial

    Rhythmic signal analysis can be improved with a transform our of the time domain. While Fourier techniques are traditionally applied, EMD offers an alternative approach to frequency analysis.

    2015

    Our forgotten memories are still in our heads

    My take on a recent neuroscience study in which researchers could stimulate the brain to evoke a forgotten memory

    Searching for San Diego’s finest burrito

    My first attempt at reviewing carne asada burritos across the city.

    Formal proof that Farey Sequences yield Ford Circles

    Inspired by a recent Numberphile episode, I explore why exactly Farey sequences will produce the Ford Circles fractal.

    Game theory to guess closest random number

    A search for the optimal strategy to win in a competitive number guessing game

    Median estimation is superior to random number generation in guessing game

    A response to a recent Numberphile video which proposed a strategy for guessing random numbers. This new strategy is optimal.

    Memory replay probably isn’t the answer to explaining memory consolidation

    My naive thoughts on the popular phenomena in neuroscience, memory replay

    Github Twitter Google Scholar