publications | Pushkar Shukla

2025

Preprint

BiasConnect: Investigating Bias Interactions in Text-to-Image Models

Pushkar Shukla, Aditya Chinchure, Emily Diana, Alexander Williams Tolbet, Vineeth N. Subramanian, and 3 more authors

2025

Abs

The biases exhibited by Text-to-Image (TTI) models are often treated as if they are independent, but in reality, they may be deeply interrelated. Addressing bias along one dimension, such as ethnicity or age, can inadvertently influence another dimension, like gender, either mitigating or exacerbating existing disparities. Understanding these interdependencies is crucial for designing fairer generative models, yet measuring such effects quantitatively remains a challenge. In this paper, we aim to address these questions by introducing BiasConnect, a novel tool designed to analyze and quantify bias interactions in TTI models. Our approach leverages a counterfactual-based framework to generate pairwise causal graphs that reveals the underlying structure of bias interactions for the given text prompt. Additionally, our method provides empirical estimates that indicate how other bias dimensions shift toward or away from an ideal distribution when a given bias is modified. Our estimates have a strong correlation (+0.69) with the interdependency observations post bias mitigation. We demonstrate the utility of BiasConnect for selecting optimal bias mitigation axes, comparing different TTI models on the dependencies they learn, and understanding the amplification of intersectional societal biases in TTI models.

2024

ECCV 2024

TIBET: Identifying and evaluating biases in text-to-image generative models

Aditya Chinchure^*, Pushkar Shukla^*, Gaurav Bhatt, Kiri Salij, Kartik Hosanagar, and 2 more authors

2024

Abs Website

Text-to-Image (TTI) generative models have shown great progress in the past few years in terms of their ability to generate complex and high-quality imagery. At the same time, these models have been shown to suffer from harmful biases, including exaggerated societal biases (e.g., gender, ethnicity), as well as incidental correlations that limit such a model’s ability to generate more diverse imagery. In this paper, we propose a general approach to study and quantify a broad spectrum of biases, for any TTI model and for any prompt, using counterfactual reasoning. Unlike other works that evaluate generated images on a predefined set of bias axes, our approach automatically identifies potential biases that might be relevant to the given prompt, and measures those biases. In addition, we complement quantitative scores with post-hoc explanations in terms of semantic concepts in the images generated. We show that our method is uniquely capable of explaining complex multi-dimensional biases through semantic concepts, as well as the intersectionality between different biases for any given prompt. We perform extensive user studies to illustrate that the results of our method and analysis are consistent with human judgements.
Preprint

Utilizing Adversarial Examples for Bias Mitigation and Accuracy Enhancement

Pushkar Shukla, Dhruv Srikanth, Lee Cohen, and Matthew Turk

2024

Abs

We propose a novel approach to mitigate biases in computer vision models by utilizing counterfactual generation and fine-tuning. While counterfactuals have been used to analyze and address biases in DNN models, the counterfactuals themselves are often generated from biased generative models, which can introduce additional biases or spurious correlations. To address this issue, we propose using adversarial images, that is images that deceive a deep neural network but not humans, as counterfactuals for fair model training. Our approach leverages a curriculum learning framework combined with a fine-grained adversarial loss to fine-tune the model using adversarial examples. By incorporating adversarial images into the training data, we aim to prevent biases from propagating through the pipeline. We validate our approach through both qualitative and quantitative assessments, demonstrating improved bias mitigation and accuracy compared to existing methods. Qualitatively, our results indicate that post-training, the decisions made by the model are less dependent on the sensitive attribute and our model better disentangles the relationship between sensitive attributes and classification variables.

2023

CVPR-W 2023

CAVLI - Using Image Associations To Produce Local Concept-Based Explanations

Pushkar Shukla, Sushil Bharati, and Matthew Turk

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Jun 2023

Abs

While explainability is becoming increasingly crucial in computer vision and machine learning, producing explanations that are able to link decisions made by deep neural networks to concepts that are easily understood by humans still remains a challenge. To address this challenge, we propose a framework that produces local concept based explanations for the classification decisions made by a deep neural network. Our framework is based on the intuition that if there is a high overlap between the regions of the image that the model associates the most with the concept and the regions of the image that are useful for decision-making then the decision is highly dependent on the concept. Our proposed CAVLI framework combines a global approach (TCAV) with a local approach (LIME). To test the effectiveness of our approach, we conducted experiments on both the ImageNet and CelebA datasets. These experiments demonstrated that our framework can produce explanations that are easy for humans to understand. By providing local concept-based explanations, our framework has the potential to improve the transparency and interpretability of deep neural networks in a variety of applications.

2021

AAAI 2021

Text-based rl agents with commonsense knowledge: New challenges, environments and baselines

Keerthiram Murugesan, Mattia Atzeni, Pavan Kapanipathi, Pushkar Shukla, Sadhana Kumaravel, and 4 more authors

In Proceedings of the AAAI Conference on Artificial Intelligence, Jun 2021

Abs

Text-based games have emerged as an important test-bed for Reinforcement Learning (RL) research, requiring RL agents to combine grounded language understanding with sequential decision making. In this paper, we examine the problem of infusing RL agents with commonsense knowledge. Such knowledge would allow agents to efficiently act in the world by pruning out implausible actions, and to perform look-ahead planning to determine how current actions might affect future world states. We design a new text-based gaming environment called TextWorld Commonsense (TWC) for training and evaluating RL agents with a specific kind of commonsense knowledge about objects, their attributes, and affordances. We also introduce several baseline RL agents which track the sequential context and dynamically retrieve the relevant commonsense knowledge from ConceptNet. We show that agents which incorporate commonsense knowledge in TWC perform better, while acting more efficiently. We conduct user-studies to estimate human performance on TWC and show that there is ample room for future improvement.
US Patent

Automated health condition scoring in telehealth encounters

John O’donovan, Pushkar Shukla, Paul C McElroy, Sushil Bharati, and Marco Pinter

In US Patent App. 16/949,370, Jun 2021

2020

Preprint

Enhancing text-based reinforcement learning agents with commonsense knowledge

Keerthiram Murugesan, Mattia Atzeni, Pushkar Shukla, Mrinmaya Sachan, Pavan Kapanipathi, and 1 more author

arXiv preprint arXiv:2005.00811, Jun 2020

Abs

In this paper, we consider the recent trend of evaluating progress on reinforcement learning technology by using text-based environments and games as evaluation environments. This reliance on text brings advances in natural language processing into the ambit of these agents, with a recurring thread being the use of external knowledge to mimic and better human-level performance. We present one such instantiation of agents that use commonsense knowledge from ConceptNet to show promising performance on two text-based environments.

2019

ACL 2019

What should I ask? using conversationally informative rewards for goal-oriented visual dialog

Pushkar Shukla, Carlos Elmadjian, Richika Sharan, Vivek Kulkarni, Matthew Turk, and 1 more author

ACL 2019, Jun 2019

Abs

The ability to engage in goal-oriented conversations has allowed humans to gain knowledge, reduce uncertainty, and perform tasks more efficiently. Artificial agents, however, are still far behind humans in having goal-driven conversations. In this work, we focus on the task of goal-oriented visual dialogue, aiming to automatically generate a series of questions about an image with a single objective. This task is challenging since these questions must not only be consistent with a strategy to achieve a goal, but also consider the contextual information in the image. We propose an end-to-end goal-oriented visual dialogue system, that combines reinforcement learning with regularized information gain. Unlike previous approaches that have been proposed for the task, our work is motivated by the Rational Speech Act framework, which models the process of human inquiry to reach a goal. We test the two versions of our model on the GuessWhat?! dataset, obtaining significant results that outperform the current state-of-the-art models in the task of generating questions to find an undisclosed object in an image.

2018

ETRA 2018

3D gaze estimation in the scene volume with a head-mounted eye tracker

Carlos Elmadjian, Pushkar Shukla, Antonio Diaz Tula, and Carlos H Morimoto

In Proceedings of the Workshop on Communication by Gaze Interaction, Jun 2018

Abs

Most applications involving gaze-based interaction are supported by estimation techniques that find a mapping between gaze data and corresponding targets on a 2D surface. However, in Virtual and Augmented Reality (AR) environments, interaction occurs mostly in a volumetric space, which poses a challenge to such techniques. Accurate point-of-regard (PoR) estimation, in particular, is of great importance to AR applications, since most known setups are prone to parallax error and target ambiguity. In this work, we expose the limitations of widely used techniques for PoR estimation in 3D and propose a new calibration procedure using an uncalibrated head-mounted binocular eye tracker coupled with an RGB-D camera to track 3D gaze within the scene volume. We conducted a study to evaluate our setup with real-world data using a geometric and an appearance-based method. Our results show that accurate estimation in this setting still is a challenge, though some gaze-based interaction techniques in 3D should be possible.
CVPR-W 2018

Automatic cricket highlight generation using event-driven and excitement-based features

Pushkar Shukla, Hemant Sadana, Apaar Bansal, Deepak Verma, Carlos Elmadjian, and 2 more authors

In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, Jun 2018

Abs

Producing sports highlights is a labor-intensive work that requires some degree of specialization. We propose a model capable of automatically generating sports highlights with a focus on cricket. Cricket is a sport with a complex set of rules and is played for a longer time than most other sports. In this paper we propose a model that considers both event-based and excitement-based features to recognize and clip important events in a cricket match. Replays, audio intensity, player celebration, and playfield scenarios are examples of cues used to capture such events. To evaluate our framework, we conducted a set of experiments ranging from user studies to a comparison analysis between our highlights and the ones provided by the official broadcasters. The general approval by users and the significant overlap between both kinds of highlights indicate the usefulness of our model in real-life scenarios

2017

ICCV-W 2017

A computer vision framework for detecting and preventing human-elephant collisions

Pushkar Shukla, Isha Dua, Balasubramanian Raman, and Ankush Mittal

In Proceedings of the IEEE international conference on computer vision workshops, Jun 2017

Abs

Human Elephant Collision (HEC) is a problem that is quite common across many parts of the world. There have been many incidents in the past where conflict between humans and elephants has caused serious damage and resulted in the loss of lives as well as property. The paper proposes a frame-work that relies on computer vision approaches for detecting and preventing HEC. The technique initially recognizes the areas of conflict where accidents are most likely to occur. This is followed by elephant detection system that identifies an elephant in the video frame. Two different algorithms to detect the presence of elephants having a mean average precision of 98.621% and 97.667% have been proposed in the paper. The position of the elephant once detected is tracked with respect to the area of conflict with a particle filter. A warning message is displayed as soon as the position of the elephant overlaps with the area of conflict. The results of the techniques that were applied on videos were discussed in the paper.
WACV 2017

A deep learning frame-work for recognizing developmental disorders

Pushkar Shukla, Tanu Gupta, Aradhya Saini, Priyanka Singh, and Raman Balasubramanian

In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Jun 2017

Abs

Text-based games have emerged as an important test-bed for Reinforcement Learning (RL) research, requiring RL agents to combine grounded language understanding with sequential decision making. In this paper, we examine the problem of infusing RL agents with commonsense knowledge. Such knowledge would allow agents to efficiently act in the world by pruning out implausible actions, and to perform look-ahead planning to determine how current actions might affect future world states. We design a new text-based gaming environment called TextWorld Commonsense (TWC) for training and evaluating RL agents with a specific kind of commonsense knowledge about objects, their attributes, and affordances. We also introduce several baseline RL agents which track the sequential context and dynamically retrieve the relevant commonsense knowledge from ConceptNet. We show that agents which incorporate commonsense knowledge in TWC perform better, while acting more efficiently. We conduct user-studies to estimate human performance on TWC and show that there is ample room for future improvement.