Statement of Purpose

This page presents my statement of purpose in computer science as required for the ICS Ph.D. Portfolio. It summarizes my professional interests in research, teaching, service, and/or product development.

Background and Research Interests

Current work in software and safety vulnerabilities is not at a stage where best practices meets usability. One does not need to look further than their peers to note that their computers are a few weeks behind in security patches [1]. On social media, tweets and posts spread like wildfire questioning sites where hitting a login button a couple times with an empty password can grant a user root access [2]. At the same time, developers question the quality of documentation of software vulnerabilities, and if cataloging organizations are up to the task, given the increasing delays in indexing the ever-growing number of vulnerabilities [3]. Aerospace safety incidents involving buggy software behavior have been already reported in incidents databases, but not effectively collected, analyzed, and tied to safety risks. [4], [5], [6].

These concerns highlight the state of the art in software and safety vulnerabilities and suggest several needed areas for improvement. Companies can benefit from improved methods for software vulnerability detection. Users and server administrators, software developers, and cataloging companies can benefit from methods which simplify identification of related software vulnerability content, to simplify awareness and content navigation. Existing text databases can benefit from more efficient textual content analysis for the identification of upcoming or relevant software vulnerabilities and safety incidents that are trending.

Existing methods have limitations and require further research to address these practical needs. Security and safety features in both software and text require labeled data, which is scarce: Among open source projects, security issues are often locked to prevent exploits, limiting researcher access to their datasets. Likewise, identifying software vulnerability discussions in social media, is currently a needle in a haystack problem.

Even with appropriate availability of data, existing machine learning methods still pose challenges in the presentation of results: Several studies of security vulnerabilities disagree on measures of accuracy, leading to a body of inconclusive results upon which to build. Text mining methods present challenges in summarizing meaningful and practical vulnerability information to a user that is intelligible. Textual tools and visualizations are still being actively researched and proposed, but few have been adequately tested in experiments to assess their true viability and usability.

It is in this current landscape of software and safety vulnerability research that I see an opportunity to make contributions to the field, by building upon existing literature and work from my chairperson and collaborators in code analysis and text mining.

Progress

Over the past three years I have designed and maintained a Github project incubator, PERCEIVE, which served to assist us in hosting, mentoring, tracking and collaborating with 17 independent studies, honors presentations, and capstones at both the undergraduate and master’s student level. In the spirit of an open source project, through issues, pull requests and code reviews we created Crawlers [7] to parse social media using various heuristics to identify irrelevant information and spam, we created several Python notebooks [8], [9], [10], and we explored ideas and analyzed collected data to support the project vision: to identify software vulnerabilities as concepts.

We have also identified, and recycled abandoned and hard to reuse open-source code, [11], [12]. My own personal contribution was the construction of an R package that serves as data pipeline bridge between data and visualization tools [13].

Through the use of this pipeline, we have established research collaborations and, in some cases, funding for the research with the US Air Force Reasearch Laboratory (AFRL), US CERT, NASA, University of Maryland, and Drexel University. We have now published three works supporting the validity of our vision and the potential for its future applicability in aerospace [14], [15], [16]. Ongoing collaboration work seeks to further strengthens the results through usage in different datasets, and extend the analysis of software vulnerabilities as bugs and concepts over time.

Goals

In the next year I intend to polish, leverage and publish more of the pipeline and collaboration work to submit a proposal for a framework to identify software vulnerabilities as bugs and concepts and its applications. Within two years I hope to implement, and defend the project vision.