Projects – Amin Hosseiny Marani, PhD Candidate

An Interdisciplinary Approach to Understanding Cultures of Ethics in STEM

aminhosseiny — Thu, 23 Jun 2022 19:47:40 +0000

This project, recorded during our six months of observations of Dr. Griffin’s lab, illustrates how scientific ethics are constituted in the everyday work of research and mentorship. Neither Jordan nor Dr. Griffin set out to have an explicit conversation about ethics. But in advising her mentee on how to professionalize the aesthetics of her data presentation, Dr. Griffin could not help but raise the issue of the ethics of data presentation– namely, how something as seemingly innocuous as an axis label could mislead an audience, intentionally or unintentionally. An expectation of ethical behavior was conjured out of an ordinary presentation of first-year research, helping to define a culture of ethics.

The everyday cultural production of ethics stands in distinction both to professional standards governing ethical scientific behavior and to formal training or coursework on ethics that students receive. Other work has explored the varied and often limited effects that codes of ethics (Bullock and Panicker, 2003; Shrader-Frechette, 1994) and student curricula (Antes et al., 2009; May and Luth, 2013; Mulhearn et al., 2017; Mumford et al., 2015) have on scientists’ thoughts and behavior around ethical challenges. Somewhat less attention has been paid to how ethics are defined and understood through the everyday interactions in research laboratories. Our project thus set out to answer the question: how do the organization, conversations, relationships, and overall culture of a research laboratory determine group members’ understandings and actions with regard to ethics? Our work follows previous laboratory studies examining “epistemic cultures” (Knorr-Cetina, 1999), but rather than identifying epistemes, we investigate ethical cultures.

In our study, we observed two lab environments for one academic year. We employed a mixed-methods approach combining ethnography, rhetorical analysis, and computational topic modeling (described further below). This combination enabled us to closely read participants’ language and dialogue, to recognize the cultural context of each laboratory, and to capture patterns that are not easily visible to human users. While our study is not a comparative analysis, having two reference points helped reveal how different lab environments produce distinct ethical micro-cultures shaped by their particular community of participants.

Accepted for Publication at Bulletin of Science, Technology, & Society (BSTS)

Are You Taking a Break from Eating Disorder Recovery? What Breaks Can Tell us about Use of Social Media

aminhosseiny — Thu, 23 Jun 2022 19:40:37 +0000

People are using social media to articulate their life transitions; e.g., childbirth, pregnancy loss, substance misuse, gender transition, mental health, eating disorders (ED), etc. People rely on social networking sites (SNS) to connect with others, express their individuality, be entertained, or to be informed about the world. Social media platforms complementarily provide technology for individuals to share their thoughts, feelings, ideas, interests and more. These disclosures can be explicit or implicit, public or private, anonymous or attached to their identity. Technology mediated communication is being used by people to share commonalities of their experiences of an issue or a phenomena.

ED is one of the phenomena that people use social media to share their experiences, narrate their journey, and express their feelings in the transitions they go through often described as a recovery journey.

There has been very little focus or attention paid to how people use technology, specifically their social media to cope with the emotional turmoil and negative experiences felt during the transition phase.

While many previous works consider changes as binary major events (i.e., only one stage at a time), less examination has been done on quotidian transitions. Many life transitions are linear processes – childbirth, gender transition, promotion etc. But when it comes to mental health, specifically recovery, it is easy to see that the process is non-linear and that changes happen over a period of time: progress is interspersed with periods of relapse, impulses, stagnation etc. As such it becomes all the more important to explore how people use their social media to explain or discuss the process on their recovery journeys.

Submitted to ACM Transaction on Human-Computer Interaction (TOCHI)

More than Good and Bad: Human Assessments of Machine Labeling Quality Have Multiple Dimensions

aminhosseiny — Thu, 23 Jun 2022 19:32:57 +0000

This project develops a novel measure for human assessments of quality in machine labeling tasks. The paper tests this measure across two studies, one using an unsupervised task (generating labels for topic models) and one using a supervised task (labeling framing in political news coverage). For each label, study participants responded to several items asking them to assess each label according to a variety of different criteria.

Exploratory factor analysis of these items reveals a two-factor latent structure in participants’ assessments of label quality that is consistent across both studies. Subsequent analysis demonstrates that this multi-item, two-factor measure can reveal nuances that would be missed using either a single-item measure of perceived label quality or established calculable performance metrics. The paper concludes by suggesting future directions for the development of human-centered approaches to evaluating NLP and ML systems more broadly.

This paper will be submitted soon…

Topic Modeling Stability

aminhosseiny — Thu, 23 Jun 2022 19:21:24 +0000

Topic modeling includes a variety of machine learning techniques for identifying latent themes in a corpus of documents. Generating an exact solution (i.e., finding global optimum) is computationally intractable. Various optimization techniques (e.g., Variational Bayes, or Gibbs Sampling) are employed to generate topic solutions approximately by finding local optima. Such an approximation often begins with a random initialization, which leads to different results with different initialization.

A highly stable topic model is able to produce topic solutions that are partially or completely identical across multiple runs. Term stability refers to similarity of multiple runs of a single topic model.

This paper/project reviews different approaches to measure stability, and different techniques that are intended to improved stability. Although a couple of works have been done analyzing, measuring, and/or improving stability, no single paper has provided a thorough review of different stability metrics and various techniques that improved stability.

Under revision for ACM Computing Surveys

Text2Table

aminhosseiny — Thu, 23 Jun 2022 18:50:37 +0000

Named Entity Recognition (NER) and Relation Extraction (RE) are two common ways of summarizing clinical documents (e.g., discharge summaries). While deep learning methods have been received a lot of attention lately, it is not practical to run these methods on every single machine. Besides, the restriction of dataset makes the fine-tuning biased toward the trained corpus.

With the availability of UMLS, Snomed and other NER/RE datasets we were able to create a system to include the new and old NLP techniques to improve the speed and performance of NER/RE models.

Text2Table is a two part package for fast and reliable clinical named entities and relation extraction. First part, LinearNER, includes a very quick and easy to deploy approach to extract named entites using a combination of database lookup (LevelDB), Conditional Random Field (CRF) classification, and Inverted Index (Apache Solr) approaches. The second part, AINER, detect named entities and relation extraction between the extracted named entites using deep learning transformers.

Public codes of the Text2table project will be released soon…

A paper will be submitted for this project soon…

One Rating to Rule Them All? Evidence of Multidimensionality in Human Assessment of Topic Labeling Quality.

aminhosseiny — Thu, 23 Jun 2022 17:09:51 +0000

Two general approaches are common for evaluating automatically generated labels in topic modeling: direct human assessment; or performance metrics that can be calculated without, but still correlate with, human assessment. However, both approaches implicitly assume that the quality of a topic label is single-dimensional.
In contrast, this project provides evidence that human assessments about the quality of topic labels consist of multiple latent dimensions. This evidence comes from human assessments of four simple labeling techniques.

For each label, study participants responded to several items asking them to assess each label according to a variety of different criteria.
Exploratory factor analysis shows that these human assessments of labeling quality have a two-factor latent structure. Subsequent analysis demonstrates that this multi-item, two-factor assessment can reveal nuances that would be missed using either a single-item human assessment of perceived label quality or established performance metrics. The paper concludes by suggesting future directions for the development of human-centered approaches to evaluating NLP and ML systems more broadly.

Bias as a Distinct Factor in Human Ratings of Machine Labeling

aminhosseiny — Thu, 23 Jun 2022 14:06:16 +0000

This project argue that human assessments of machine labeling can reveal bias as a distinct measure separate from other perceptions of label quality. Human subjects were asked to assess the quality of automatically generated labels for a trained topic model. Quality assessments were gathered using 15 distinct self-report questions. Exploratory factor analysis identified a distinct “bias” factor. This point is likely relevant for a wide variety of machine labeling tasks.

Want to read more?