STAT 427/627 Statistical Machine Learning

Spring 2025

Mondays 5:30 - 8:00 pm in Don Meyers room 121


Instructor: Michael Baron
Office: Meyers Building, DMTI-106D (East Campus)
Phone: (202) 885-3130
Email: baron at american.edu
Office hours: Monday 4:00 - 5:15 pm in DMTI 106-D
Teaching Assistant: Jared Martin
TA email: jm1212a at american.edu
TA office hours: Friday 5:00 - 6:00 pm on Zoom

R corner

Download R from this site and install it in your system. The same site also contains various R manuals and other help.

Python corner

To use Python via Jupiter notebook, install Anaconda from this site, start it, and click "Launch" in the Jupiter Notebook box. Here is a nice installation video guide. A detailed interactive Python tutorial is here.

      Classroom labs

Datasets

Data sources for the final project

  • A good collection of real data sets suitable for this project is in the Machine Learning UCI Repository.
  • A diverse collection of datasets from Hawkes prepared for your projects and supplied with project ideas.
  • A huge collection of data sets is linked to this data mining metasite called KDnuggets.
  • If you are interested, you may get tons of Government data.
  • Also, Biomedical data from various sources.
  • Detailed NFL data since 1999, supported by several R and Python packages.
  • Air and space exploration? Here are NASA data bases.

    Social Justice

  • Detailed demographics data in the U.S.A. from the US Census Bureau
  • Notice the COVID-19 and Race and Ethnicity from the COVID tracking project
  • Income disparity from the US Census Bureau
  • Poverty data from the US Census Bureau
  • Health insurance coverage from the US Census Bureau
  • Household income from the US Census Bureau
  • Race and Economic Opportunity Data Tables from the US Census Bureau
  • Labor Force Statistics from the Current Population Survey from the US Bureau of Labor Statistics
  • Race and Origin of Victims and Offenders, the National Crime Victimization Survey from the US Dept of Justice Office of Justice Programs
  • Racial profiling, arrests, citations, warnings - police data from the US Data.gov
  • Unemployment, poverty, educational attainment for the U.S. States and counties from the US Dept. of Agriculture
  • Data sources for studies on racial justice and health equity from the UCLA Center for the Study of Racism, Social Justice & Health.

    COVID-19 data

  • Humanitarian Data Exchange (HDX) is a metasite that publishes and updates complete COVID-19 data from the World Health Organization, Metabiota, Global Health 50/50, Assessment Capacities Project (ACAPS), and others. Location: https://data.humdata.org/, https://data.humdata.org/event/covid-19

  • Johns Hopkins University COVID-19 detailed up-to-date data on confirmed infected, recovered, tested, and fatal cases by countries, states, and main locations of the outbreak are published on HDX and Github. Location: https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases, https://github.com/CSSEGISandData/COVID-19

  • GitHub, Inc., publishes and updates data bases and accompanying software packages on the on the COVID-19 pandemic outbreak. Location:
    https://github.com/datasets/covid-19,
    https://github.com/github/covid19-dashboard,
    https://github.com/ImperialCollegeLondon/covid19model,
    https://github.com/neherlab/covid19_scenarios,
    https://github.com/nytimes/covid-19-data
    and by country: Italy, Japan, India, etc.

  • CEBM/Oxford data

    Handouts

    Food for thought



    Questions/comments/suggestions? Write to baron@american.edu