RESEARCH ASST II (TEMP)

Apply Now

Responsibilities*

The U-M School of Nursing, Applied Biostatistics Lab (ABL) seeks a student who is interested in contributing to a summer interdisciplinary research project situated at the cutting edge of intersection of tranlational research and data science. The goal is to develop a large language model (LLM) that extracts text and numeric data from Pubmed Central full-text articles. The immediate application is a model of risk of cardiovascular disease among individuals with prediabetes.

  1. Develop and Fine-tune Large Language Model (LLM): Utilize your expertise in natural language processing (NLP) and machine learning to fine-tune a robust LLM to effectively extract relevant data pertaining to cardiovascular disease risk factors from the vast corpus of Pubmed Central articles.
  2. Data Preprocessing and Annotation: Collaborate with the research team to preprocess and prepare already existing raw textual data and annotate key information relevant to the study objectives. Implement strategies for data cleaning, normalization, and standardization to ensure the accuracy and consistency of extracted data.
  3. Algoritm Implementation and Optimization: Explore prompt engineering to enhance the performance and scalability of the LLM, optimizing predictive accuracy.
  4. Model Evaluation and Validation: Conduct through evaluations of the developed/fine-tuned LLM to assess its performance against predefined metrics and benchmarks. Validate the model outputs through rigorous testing and comparison with ground truth data, identifying areas for improvement and refinement.
  5. Documentation and Reporting: Maintain comprehensive documentation of the LLM development process, including codebase, experimental setup, and results analysis. Generate clear and concise reports summarizing key findings, challenges encountered, and future directions for research.
  6. Collaboration and Communication: Collaborate effectively with interdisciplinary team members, including biostatisticians, computer scientists, and domain experts, to ensure alignment with research goals and objectives. Communicate progress updates, findings, and insights in regular team meetings and presentations.

Desired Qualifications*

  • Currently enrolled in an undergraduate or master's program in computer science, data science, or a related field at the Univesity of Michigan.Authorization to work in the U.S. is a precondition of employement.  Applicnts will not be sponsored for work visas.
  • Familiarity with fine-tuning of foundational LLMs.
  • Proficiency in programming languages commonly used in MLP and machine learning, such as Python, TensorFlow, or Py Torch.
  • Strong foundation in NLP techniques, including text preprocessing, feature extraction, and sequence modeling.
  • Familiarity with deep learning architectures (e.g., transformers, recurrent neural networks) and their applications in language understanding tasks
  • Experience wih relevant libraries and frameworks for NLP, such as NLTK, spaCy, or Hugging Face Transformers.
  • Excellent problem-soving skills and a proactive approach to addressing challenges in algorithm development and optimization.
  • Effective communication skills and ability to work collaboratively in a team-oriented environment.
  • Must demonstrate excellent communiction skills and attention to detail.
  • Training or experience related to developing LLMs.
  • Statistical or programming coursework, training, or experience is preferred.
  • Experience working on interdisciplinary teams is preferred.

U-M EEO/AA Statement

The University of Michigan is an equal opportunity/affirmative action employer.