Data Science Lab

Director: Nathalie Japkowicz

A group of people posing for the camera

Description automatically generated

 

The Computer Science Data Science Lab at American University develops and applies machine learning techniques to a variety of domains. Current areas of interest include: data stream mining, anomaly detection, concept drift, class imbalances, one-class learning. Current domains of application include: cyber-security, mobile traffic classification, and genomics.




Current Members



Senior Personnel

A person wearing glasses and looking at the camera

Description automatically generated

 

Roberto Corizzo, Assistant Professor of Research from July 2019 until June 2020

Roberto Corizzo is an assistant professor of research in the Department of Computer Science at

American University, where he is pursuing research in lifelong streaming anomaly detection. 

Prior to that, he was a research fellow in the Department of Computer Science at University of

Bari, Italy, and a research intern at the INESC TEC research institute in Porto, Portugal, under

the supervision of Prof. Joao Gama. He co-authored 16 articles, including 5 journal publications

in venues such as IEEE Transactions on Industrial Informatics, Data Mining and Knowledge

Discovery, and Information Sciences.

 

Main Project: Hierarchically and Laterally Growing Autoencoder (HLGAE)

 (Funded by DARPA)

 

Lifelong learning allows to create adaptive systems that evolve over time, adapting to changing

 environmental conditions. This behavior is possible by combining neural networks and machine

learning techniques with statistical learning theory. Our aim is to extract adaptive models with

growing capabilities in two modalities: lateral and hierarchical growth. Most data streams

contain sub-concepts, and concept or subconcept drifts occur naturally as time evolves.  The

hypothesis is that a growing hierarchy of models can improve the recognition of diverse sub

concepts, and, thus, enable a more precise anomaly detection performance according to the most

specialized and fitting sub-model.

 

 

Students

A person standing in front of a building

Description automatically generated

 

Reham Amin, Visiting PhD Student: from July 2019 until end of December 2019

Reham Amin is a visiting PhD student from the Faculty of Computers and Informatics, Suez

Canal University, Ismailia, Egypt. Her research combines elements of cybersecurity, machine

learning and information visualization. At AU, she is crating a visualization system for

hyperparameter tuning in neural networks.

 

 

 

A person standing in front of a mountain

Description automatically generated

 

Yohan Dauphin, Visiting Scholar: from September 2019 until end of February 2020

Yohan Dauphin is a visiting scholar majoring in Computer Science from the engineering school

of CPE Lyon in France. His research focuses on machine learning and deep learning for

 detection of abnormalities in medical images.

 

 

A person looking at the camera

Description automatically generated

 

Victor H. Barella, visiting Ph.D. student (University of São Paulo, Brazil): September 2019 - May 2020

Victor Barella is a Ph.D. student in the Institute of Mathematical and Computer Sciences at the

University of São Paulo (ICMC-USP), Brazil, where Dr. Andre de Carvalho supervises him. His

current research focuses on data characteristics and pre-processing techniques for imbalanced

classification tasks. His research interests include imbalanced datasets, data complexity

measures, meta-features, pre-processing techniques, and hierarchical classification. At American

University, he is working on a meta-learning approach for imbalanced classification tasks under

the supervision of Dr. Nathalie Japkowicz.

 

A person standing in front of a building

Description automatically generated

 

Alexis Godwin, American University, BS (expected 2020)  

Alexis Godwin is a current senior at American University majoring in Computer

Science. She has interests in data mining, remote sensing and geospatial data, and

data analytics. She is currently working with statistical and machine learning methods

on medical data.

 




Past Members



Senior Personnel

          

 

           Zhen Liu, Visiting Scholar from March 2018 to March 2019

            Zhen Liu is a visiting scholar from the School of Medical Information Engineering at

Guangdong Pharmaceutical University in Guangzhou, China where she has been working

as a Lecturer. Her research interests lie in the areas of Machine Learning and Cyber

Security. Her previous research was in the areas of multi-class learning, the class

imbalance problem applied to mobile traffic classification. At American University, she

is focusing on Anomaly Detection in Data Streams, and devising methods for dealing

with issues of concept drifts in the context of intrusion detection.

 

Li Liu, Visiting Scholar from September 2018 to February 2019

Li Liu is a visiting scholar from the School of Information Science and Technology

at Huizhou University in Guanzhou, China where she has been working as an Associate

Professor. Her research interests lie in the areas of Machine Learning and Data Mining.

Her previous research was in the area of dimensionality reduction, mainly focusing on

manifold algorithms, and applied to content-based image retrieval. At American University,

she is focusing on one-class classification and devising methods for dealing with issues

of sub-concept drifts in complex domains.

 

Zhao Yang, Postdoctoral fellow from January 2017 until July 2018

Zhao Yang conducts research mainly in big data analytics, high performance computing

and statistical learning theory. Among his non-professional interests are hiking, swimming, etc.

He is the winner of 2015 Alan Berman Research Publication Award, Naval Research Laboratory (NRL),

Washington D.C.(Best Paper Award of the Department of Navy)

 

Projects:

High Performance Data Mining Analytical Environment for Large Scale Cyber Security Data

In this project, we propose a framework for processing and analyzing large-scale cyber security data

using a Big Data infrastructure. Existing Big Data solutions do not include high performance

mechanisms to analyze large-scale cyber-security data. In this work, we extend current open-source

platforms to support cyber-security data and demonstrate its analytical use with some common data

types and data mining technology provided by the open source solution. The resulting framework is a

robust capability to share large-scale security data and make its outputs available to end users.

 

Efficient 3-D Object Detection for Large Scale DEM (Digital Elevation Model) data set

 

We propose a three-dimensional object detection method using large scale DEM data.

Recently, 2D object detection has been widely used in navigation and geospatial

computation. Nevertheless, conventional object detection systems such as edge detection

algorithms was originally designed as an image processing technique for finding the

boundaries of objects within images.  The edge detection algorithms have low

universality because they are designed for 2-D objects, which limits the types of targets

that can be detected. Our method implements machine learning algorithms that solves

this problem and enables high-efficient, deterministic measurement of the feature of a

large scale 3D target from the DEM data set without any prior knowledge about the

feature of the targets. Using this technique and a prototype system that we developed, we

also demonstrated a number of applications, including sea mountain detection which can

be used by surface vessels and UUV (unmanned underwater vehicles),etc.

 

Students

           

Jonathan Kaufmann is a senior in the department of Computer Science. His research

focuses on natural language processing and genomics. His previous research examined

the economic impact of war on child soldiers, as well as reintegration.

 

 

 

Roberto Corizzo, visiting Ph.D. student (University of Bari, Italy): Spring-Summer 2017

 

Roberto Corizzo is currently finishing his Ph.D. in predictive models for streams of sensor data in the

Department of Computer Science at the University of Bari, Italy. He is part of the KDDE research group

coordinated by Prof. Donato Malerba. His research interests include big data analytics, data mining and

predictive modeling techniques for sensor networks.

         

Ashley Zhang, high school, rising senior (Langley High School): Summer 2017

Ashley Zhang is currently a rising senior at Langley High School. She is hoping to pursue an undergraduate

degree in computer science and has experience programming mainly in Java and Javascript. Her research interests                                                                                                                                 

include game design and machine learning.

      

James Clark, BS (expected 2018) from May 2017 until September 2018  

James Clark is currently a Senior majoring in Computer Science from Plymouth, Massachusetts.                                                                                                                                                 

James’ academic interests involve combining mathematics and computer science in areas such as scientific computing,                                                                                                                        

computational science, and machine learning. Interests outside of school involve hiking in the white mountains

            of New Hampshire and playing guitar.

           Project

Anomaly Detection for Intrusion Detection, Change-point detection

 

Ezra Schwartz, high school, rising senior (Montgomery Blair High School): Summer 2018

Ezra Schwartz is a rising senior at Montgomery Blair High School in the Science, Mathematics,

and Computer Science Magnet program. He plans to pursue an undergraduate degree in

computer science. He has programing experience primarily in Java, as well as some experience

in Python and Javascript. His research interests include machine learning and big data analytics.