Post

Corpus-based learning analytics

Web app built for the Language & Technology Research Group at Teachers College, Columbia University.

I’m a part of the Language & Technology Research Group at Teachers College (TC), Columbia University, a growing group of curious people researching the impact of emerging technology, especially natural language tech, on education. You can also see the trainings and workshops I developed for the group here.

Context

Each year, the Community Language Program (CLP) at TC serves hundreds of adult English learners in NYC and visiting scholars & professionals from all over the world. It is also a “language lab” for TC’s professors & graduate students in linguistics and pedagogy research. Researchers and instructors need a tailored, accessible, and efficient system for longitudinal analysis of the massive language output and learner-instructor interactions in CLP.

Design highlights

  • A web application with an intuitive interface and zero demand for users to write code.
  • Streamlines with CLP transcription data generated by human and AI annotators.
  • Built-in database supports unlimited uploading, parsing, storing, and deleting of transcriptions.
  • Core features include word queries, visual airtime breakdown (per class and overall), and speaker filtering.
  • Modular design allows for easy integration of additional features.

Tools

Python, Flask, HTML/CSS, SQL, Regex, Visual Studio Code, Git, ChatGPT as pair programmer.

Role

I developed the project independently, while continuously integrating feedback from research group members to ensure the project met collective goals. I was responsible for the conceptualization, architecture, database schema, user interface & experience design, development, testing, and documentation of the project.

Desktop View Screenshot of the search function.

Desktop View Screenshot of the airtime analysis function.

This project is undergoing iterative development. Contact me if you have any ideas or questions.

This post is licensed under CC BY 4.0 by the author.