Monday, August 1

Monday, August 1

  • Transforming Data into Actionable and Accessible Words

  • Eunyee Koh
    Adobe Research
  • Generating natural language captions from images has been researched in-depth. However, generating natural language captions from charts has been less explored due to a lack of datasets. In this talk, I will share our research approaches patented and published at WACV, WWW, CHI, and EMNLP. Further I will share how our work evolve from an intern project, to academic papers, to customer facing demos, to product.

  • Bridging AI and Human through Communication: Multimodality, Interpretability, and Fairness

  • Jungseock Joo
    UCLA
  • Numerous AI systems have been developed from data generated from human communications, and many of them have aided and facilitated human communications. In this talk, I will discuss current progress and future opportunities in multimodal human-AI communication, focusing on two themes: (1) learning from natural communication and (2) using interpretable and fair AI as a medium to communicate with the world. I will first introduce recent works leveraging human gestures as a natural interface to guide and teach autonomous agents such as robots and virtual navigation agents and discuss our novel frameworks that can support unsupervised, communicative learning of human gestures. It will be shown that the semantics of gestures can be learned by agents via communication and incorporated into their policies. In the second part of my talk, I will explain how AI models can serve as a medium to understand human communication and social behaviors and highlight the importance of fairness and interpretability in models. I will introduce several recent works for bias measurement and mitigation including constructing a balanced dataset, mitigating annotators' cognitive biases, and counterfactual bias measurement. I will also discuss our latest work on explaining CNNs using unsupervised visual-semantic attention projection learning and demonstrate its applications for unsupervised visual data analytics.

  • Quantifying and extrapolating the capabilities of language models

  • Jaehoon Lee
    Google Brain
  • Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 200+ tasks, contributed by 400+ authors across 130+ institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline.

Lunch Break · 12:30 - 13:30

  • Multimodal Learning for Videos

  • Paul Hongsuck Seo
    Google
  • Human perceives the world through multiple sensory systems (ex., vision, audition, touch, smell), which work together complementing each other and therefore it is natural to build a model processing multiple modalities simultaneously especially given rich multimodal online data such as modern video contents containing visual frames, audio signals, human speech and meta data. Moreover, the emergence of online video sharing platforms enabled to collect large-scale video datasets for such tasks. In this talk, I will introduce our recent studies on multimodal learning in videos and explore the techniques that we can apply to improve the multimodal understanding capabilities of models. Through our extensive sets of experiments, we show significant improvements from incorporating multimodal input signals and effectiveness of those proposed techniques.

  • Scaling Robot Learning with Skills: Furniture Assembly and Beyond

  • Youngwoon Lee
    USC
  • Despite the recent progress in robot learning, robotics research and benchmarks today are typically confined to simple short-horizon tasks. However, tasks in our daily lives are much more complicated — consisting of multiple sub-tasks and requiring high dexterity skills — and the typical “learning from scratch” scheme is hardly scale to such complex long-horizon tasks.

    In this talk, I propose to extend the range of tasks that robots can learn by acquiring a useful skillset and efficiently harnessing these skills. As a first step, I will introduce a novel benchmark for complex long-horizon manipulation tasks, IKEA furniture assembly simulator. Then, I will present skill chaining approaches that enable sequential skill composition for long-horizon tasks. Finally, I will talk about how to learn a complex task efficiently using skills and skill priors extracted from diverse data.

  • Embodied AI: From Machine Learning to Learning Machines

  • Byoung-Tak Zhang
    SNU
  • Machine learning (including deep learning) has changed the paradigm of AI from rule-based “manual” programming to data-driven “automatic” programming. However, the current paradigm of machine learning requires some external system that provides them with data, making their scalability limited. Here we argue that the learner can feed itself the data autonomously if it is embodied, i.e. equipped with sensors and actuators. With the perception-action cycle the embodied AI can continually learn to solve problems in a self-teaching way by doing new actions, observing their outcomes, and correcting their own predictions like the humans and animals do. In this talk, I will show some of our studies in this direction of “(embodied) learning machine” research and discuss its implications for achieving truly human-level general AI.

Break · 16:15 - 16:35

  • Accurate Node Feature Estimation with Structured Variational Graph Autoencoder

  • Jaemin Yoo
    CMU
  • Given a graph with partial observations of node features, how can we estimate the missing features accurately? Feature estimation is a crucial problem for analyzing real-world graphs whose features are commonly missing during the data collection process. This talk introduces our recent work to be presented at KDD 2022, which proposes SVGA (Structured Variational Graph Autoencoder) for accurate feature estimation. SVGA applies strong regularization to the distribution of latent variables by structured variational inference, which models the prior of variables as Gaussian Markov random field based on the graph structure. As a result, SVGA combines the advantages of probabilistic inference and graph neural networks, achieving state-of-the-art performance in real datasets.

  • DPar2: Fast and Scalable PARAFAC2 Decomposition for Irregular Dense Tensors

  • Jun-Gi Jang
    SNU
  • Given an irregular dense tensor, how can we efficiently analyze it? An irregular tensor is a collection of matrices whose columns have the same size and rows have different sizes from each other. PARAFAC2 decomposition is a fundamental tool to deal with an irregular tensor in applications including phenotype discovery and trend analysis. Although several PARAFAC2 decomposition methods exist, their efficiency is limited for irregular dense tensors due to the expensive computations involved with the tensor.

    In this paper, we propose DPar2, a fast and scalable PARAFAC2 decomposition method for irregular dense tensors. DPar2 achieves high efficiency by effectively compressing each slice matrix of a given irregular tensor, careful reordering of computations with the compression results, and exploiting the irregularity of the tensor. Extensive experiments show that Dpar2 is up to 6.0x faster than competitors on real-world irregular tensors while achieving comparable accuracy. In addition, DPar2 is scalable with respect to the tensor size and target rank.

  • ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning

  • Sangho lee
    SNU
  • The natural association between visual observations and their corresponding sound provides powerful self-supervisory signals for learning video representations, which makes the ever-growing amount of online videos an attractive source of training data. However, large portions of online videos contain irrelevant audio-visual signals because of edited/overdubbed audio, and models trained on such uncurated videos have shown to learn suboptimal representations. Therefore, existing approaches rely almost exclusively on datasets with predetermined taxonomies of semantic concepts, where there is a high chance of audio-visual correspondence. Unfortunately, constructing such datasets require labor intensive manual annotation and/or verification, which severely limits the utility of online videos for large-scale learning. In this work, we present an automatic dataset curation approach based on subset optimization where the objective is to maximize the mutual information between audio and visual channels in videos. We demonstrate that our approach finds videos with high audio-visual correspondence and show that self-supervised models trained on our data achieve competitive performances compared to models trained on existing manually curated datasets. The most significant benefit of our approach is scalability: We release ACAV100M that contains 100 million videos with high audio-visual correspondence, ideal for self-supervised video representation learning.

Tuesday, August 2

Tuesday, August 2

  • DeepSpeed: Training and Inference Optimizations for Deep Learning

  • Deep speed team
    Microsoft Research
  • DeepSpeed optimizations advance state of the art in deep learning along two complementary dimensions: (i) improve hardware utilization to achieve peak performance and (ii) reduce compute and data requirements to minimize resource consumption. DeepSpeed optimizations include sophisticated compute, memory, communication, parallelism techniques to improve hardware performance, and advanced compression techniques to reduce resource consumption. These powerful optimizations are intuitively integrated into the DeepSpeed library to enable model scientists easily adopt and combine them in their training and inference workloads. In this talk, we will present some of the training, inference, and compression techniques available in DeepSpeed

  • Motion and Behavior Generation for Virtual Avatars in Metaverse

  • Jungdam Won
    Meta AI
  • Metaverse has gained a lot of attention as a new platform for the next-generation Internet because it has the potential to enable users to share the same experience, who live in different locations and times. It has been considered one of the most necessary techniques since the present pandemic era. In Metaverse, virtual avatars are very fundamental components because users do every activity and interaction through the avatars. Generating realistic motions for the avatars is crucial to provide a more immersive experience for users. In this talk, I'll share the challenges that we have experienced and tackled to generate plausible motions for virtual avatars. More specifically, I'll introduce two lines of work: one is for user-driven avatars and the other is for autonomous avatars.

  • Hindsight Photography

  • Keunhong Park
    Google
  • Taking a good photograph can be a time-consuming process, and it usually takes several attempts to capture a moment correctly. This difficulty stems from the many factors that make up a photo, such as framing, perspective, exposure, focus, or subject pose. Getting even one of these factors wrong can spoil a picture, even if the rest are perfect. To make matters worse, many of these factors are often out of our control; for example, a wind gust may displace the subject's hair, or a bird may fly by and occlude the shot. What if we could go back and fix some of these aspects? I will present 'hindsight photography' for capturing photos and videos in a way that enables us to go back and revise them with the benefit of hindsight. I will talk about my recent work--Nerfies and HyperNeRF--and show how they enable hindsight photography, allowing us to modify aspects of photographs that would be difficult with conventional media.

Lunch Break · 12:30 - 13:30

  • High-speed Serving for Large-scale Transformer-based Generative Models

  • Byung-Gon Chun
    SNU and FriendliAI
  • Large-scale Transformer-based models trained for generation tasks (e.g., OpenAI GPT-3, Google PaLM, Meta OPT) have recently attracted huge interest, emphasizing the need for system support for serving models in this family. However, existing systems for inference serving do not perform well on this type of workload that has a multi-iteration characteristic, due to their inflexible scheduling mechanism that cannot change the current batch of requests being processed. In this talk, we will present FriendliAI’s Orca, a distributed serving system for Transformer-based generative models. We propose two new techniques, iteration-level scheduling and selective batching, that address the problems of existing serving systems. Our evaluation on a GPT-3 175B model shows that ORCA can significantly outperform NVIDIA FasterTransformer in terms of both latency and throughput: 36.9× throughput improvement at the same level of latency. We will also discuss system support for large-scale AI broadly.

  • Unsupervised Skill Discovery

  • Gunhee Kim
    SNU
  • In this talk, I will present some of our recent works for unsupervised skill discovery in reinforcement learning, whose goal is to teach agents to acquire inherent skills from environments without any external rewards or supervision. First, I deal with how to make policy gradient (PG) methods invariant to time discretization for control. Second, I propose a novel unsupervised skill discovery method named Information Bottleneck Option Learning (IBOL) that leverages the information bottleneck principle from representation learning. Finally, I discuss Lipschitz-constrained Skill Discovery (LSD), which encourages the agent to discover more diverse, dynamic, and far-reaching skills than previous unsupervised skill discovery methods. All these works are recently published in ICML 2021, NeurIPS 2021 and ICLR 2022.

Break · 15:00 - 15:30

  • Graph-based Representation Learning for Class Discriminative Feature Embedding

  • Jongin Lim
    SNU
  • Representation learning often aims to train a deep neural network towards yielding class discriminative features where the embedded features from the same class are close to each other, while those from different classes are far apart. To this end, learning from relational information between data samples plays an important role. In this talk, I will present representation learning schemes that leverage graph modeling where nodes represent data samples and edges represent relations between them. Specifically, I’ll share our two recent works published in AAAI 2021 and CVPR 2022, respectively.

  • Exact Optimal Accelerated Complexity for Fixed-Point Iterations

  • Jisun Park
    SNU
  • Despite the broad use of fixed-point iterations throughout applied mathematics, the optimal convergence rate of general fixed-point problems with nonexpansive nonlinear operators has not been established. This work presents an acceleration mechanism for fixed-point iterations with nonexpansive operators, contractive operators, and nonexpansive operators satisfying a Hölder-type growth condition. We then provide matching complexity lower bounds to establish the exact optimality of the acceleration mechanisms in the nonexpansive and contractive setups. Finally, we provide experiments with CT imaging, optimal transport, and decentralized optimization to demonstrate the practical effectiveness of the acceleration mechanism.

  • DenForest: Enabling Fast Deletion in Incremental Density-Based Clustering over Sliding Windows

  • Bogyeong Kim
    SNU
  • The density-based clustering is utilized for various applications such as hot spot detection or segmentation. To serve those applications in real time, it is desired to update clusters incrementally by capturing only the recent data. However, the deletion of data points causes severe performance degradation. In this presentation, the deletion problem in the density-based clustering is addressed, and I introduce a novel incremental density-based clustering algorithm called DenForest. By maintaining clusters as a group of spanning trees instead of a graph, DenForest can determine efficiently and accurately whether a cluster is to be split by a point removed from the window in logarithmic time. With extensive evaluations, it is demonstrated that DenForest outperforms the state-of-the-art density-based clustering algorithms significantly and achieves the clustering quality comparable with that of DBSCAN.

  • Smooth-Swap: A Simple Enhancement for Face-Swapping with Smoothness

  • Jiseob Kim
    SNU
  • Face-swapping models have been drawing attention for their compelling generation quality, but their complex architectures and loss functions often require careful tuning for successful training. We propose a new face-swapping model called `Smooth-Swap', which excludes complex handcrafted designs and allows fast and stable training. The main idea of Smooth-Swap is to build smooth identity embedding that can provide stable gradients for identity change. Unlike the one used in previous models trained for a purely discriminative task, the proposed embedding is trained with a supervised contrastive loss promoting a smoother space. With improved smoothness, Smooth-Swap suffices to be composed of a generic U-Net-based generator and three basic loss functions, a far simpler design compared with the previous models. Extensive experiments on face-swapping benchmarks (FFHQ, FaceForensics++) and face images in the wild show that our model is also quantitatively and qualitatively comparable or even superior to the existing methods.

  • Mobile-Cloud Cooperative AI Platform for Scalable Live Video Analytics

  • Juheon Yi
    SNU
  • Live video analytics enables various useful services including Mixed Reality (MR) and autonomous driving. Despite the opportunities, designing systems for scalable and performant live video analytics is highly challenging due to high workload complexity (e.g., high data rate and inference complexity) as well as fast changing environment dynamics (e.g., network bandwidth and server resource contention). In this talk, I will present my research vision to design a mobile-cloud cooperative AI platform for scalable live video analytics. Specifically, I will present my two recent research projects: (i) EagleEye, a mobile-cloud cooperative system for multi-DNN-based AR person identification (published in ACM MobiCom 2020), and (ii) Supremo, a DNN-aware data compression and offloading system for cloud-assisted super-resolution in mobile devices (published in IEEE Transactions on Mobile Computing 2022).

Wednesday, August 3

Wednesday, August 3

  • Challenges in Causal Learning and Its Applications

  • Emre Kiciman
    Microsoft Research
  • Causal machine learning methods promise improved generalizability and robustness as compared to conventional machine learning approaches by relying on patterns generated by stable and robust causal mechanisms, rather than potentially spurious correlational patterns. However, causal approaches require making crucial assumptions about a system or data-generating process that may be unverifiable in the absence of interventional (experimental) data. We find that eliciting causal assumptions from domain experts and validating or refuting these assumptions are key challenges to the practical application of these methods. This talk describes our research efforts to address these challenges---e.g., by seeking new sources of causal assumptions---as well as experiences with DoWhy, our open-source causal inference library, and its third-party usage.

  • 3D Reconstruction and Synthesis for New Media

  • Jeong Joon Park
    Stanford
  • Photographs and videos are powerful media for recording and sharing our experiences. But these recordings are static, meaning that one cannot interact with an already recorded video by changing viewpoints or talking to people inside. In order to interactively visualize a scene, one needs to reconstruct the 3D structure of the scene and represent it with efficient and controllable latent encodings. In this talk, I will introduce my past works towards this goal, each of them focusing on different aspects of realizing the new interactive media. First, I will introduce a method that recovers a detailed environment lighting from a video of a shiny surface, such as a bag of chips, by combining physical and learning-based components. Next, I will present DeepSDF, a continuous implicit surface representation that enables high-quality shape modeling with deep learning. Lastly, I will present two recent generative methods that respectively provide effective latent representations for 3D and 4D content. All of these methods enable machines to infer beyond what’s recorded, such as rendering a scene from novel viewpoints or completing invisible parts of an object, or synthesizing new 3D scenes.

  • Scalable Trustworthy AI -- Beyond "what", towards "how".

  • Seong Joon Oh
    University of Tübingen
  • Trustworthiness of AI systems is becoming the crux of the matter. We care not only about whether an AI model is solving the task at hand (the "what" problem) but also how it is solving the task (the "how" problem). For example, fair models require certain sensitive attributes (e.g. gender) to be not utilised. Robust models require fragile cues (e.g. background) to be excluded from the reasoning. The "how" problem is not only about controlling the cues for recognition tasks but also about the mechanism and reasoning steps towards the final answer. For example, there are more demands for models that explain its own reasoning and know when it does not know.

    I will first describe the mainstream approach to solve the "how" task: fix the resources (data, annotations, and inductive biases) and find the best solution for the "how" problem. I am interested in an alternative (dual) approach: first think about whatever extra resources are likely to solve the "how" task (however expensive) and only after that think about ways to be economical with the additional resources. I believe there are many interesting possibilities hidden in the second approach. I will introduce my previous work based on this philosophy and discuss future research directions.

Lunch Break · 12:30 - 13:30

  • Fairness-aware Learning with Distillation, Augmentation and Robust Optimization

  • Taesup Moon
    SNU
  • In this talk, I will present three of our recent results for achieving group fairness in machine learning. First, I will talk about MMD-based feature distillation (MFD), which distills predictive features from an unfair teacher model while training a student model with a fairness constraint. As a result, we can show both accuracy and fairness of a student model can be improved compared to the teacher model. Second, I will present a method that can effectively learn a fair model when only partially annotated group labels are available in the training set. The resulting scheme can be used to augment the group-labeled training data with group-unlabeled data. Finally, I will talk about how we can extend the distributionally robust optimization (DRO) framework that matches the group fairness metric. We theoretically show our method can directly optimize the fairness metric, not the surrogate of it, and empirically validate it can achieve the state-of-the-art accuracy-fairness trade-off.

  • Neural Tangent Kernel Analysis of Deep Narrow Neural Networks

  • Ernest K. Ryu
    SNU
  • The tremendous recent progress in analyzing the training dynamics of overparameterized neural networks has primarily focused on wide networks and therefore does not sufficiently address the role of depth in deep learning. In this work, we present the first trainability guarantee of infinitely deep but narrow neural networks. We study the infinite-depth limit of a multilayer perceptron (MLP) with a specific initialization and establish a trainability guarantee using the NTK theory. We then extend the analysis to an infinitely deep convolutional neural network (CNN) and perform brief experiments.

  • Statistical inference on Topological Data Analysis and Application to Machine Learning

  • Jisu Kim
    Inria
  • Topological Data Analysis generally refers to utilizing topological features from data. Typical examples are cluster tree and persistent homology. The cluster tree gathers similar data together to make clusters. The persistent homology quantifies salient topological features that appear at different resolutions of the data. Topological Data Analysis provides useful information, such as delivering scientific information from data, or extracting useful features for learning.

    In this talk, I will present how topological data analysis can be statistically inferred and applied to machine learning. I will first present statistical inference on the cluster tree. Then, I will present how the randomness of the persistent homology computed from data can be statistically quantified and significant topological features can be identified. I will end this talk by presenting how topological data analysis can be applied to machine learning.

Break · 16:15 - 16:35

  • CPR: Classifier-Projection Regularization for Continual Learning

  • Sungmin Cha
    SNU
  • We propose a general, yet simple patch that can be applied to existing regularization-based continual learning methods called classifier-projection regularization (CPR). Inspired by both recent results on neural networks with wide local minima and information theory, CPR adds an additional regularization term that maximizes the entropy of a classifier's output probability. We demonstrate that this additional term can be interpreted as a projection of the conditional probability given by a classifier's output to the uniform distribution. By applying the Pythagorean theorem for KL divergence, we then prove that this projection may (in theory) improve the performance of continual learning methods. In our extensive experimental results, we apply CPR to several state-of-the-art regularization-based continual learning methods and benchmark performance on popular image recognition datasets. Our results demonstrate that CPR indeed promotes a wide local minima and significantly improves both accuracy and plasticity while simultaneously mitigating the catastrophic forgetting of baseline continual learning methods.

  • Next-Generation Computer Architecture for Brain-Inspired Computing

  • Hunjun Lee
    SNU
  • Brain-inspired computing aims to understand the mechanisms of the brain and apply them to advance various areas in science. Especially brain simulation is one of the major topics in the brain-inspired computing domain that can fundamentally improve our understanding of intelligence. As hardware architects, we aim to design a computer architecture that acts as an efficient brain simulation platform to motivate follow-up research using brain simulation. In this talk, I'll share our group's efforts to design a flexible, efficient, and scalable brain simulator which were all published in the top four computer architecture conferences (ISCA'2018, MICRO'2019, ASPLOS'2021, and HPCA'2022)

  • Cracking the codes of gene regulation through transformers

  • Dohoon Lee
    SNU
  • In recent days, the application of AI in the field of computational biology has been getting more attention than ever. One of the topics include the deciphering the histone codes, which refers to the combination of chemical modifications in histone proteins, to reveal its roles in the regulation of gene expression. In this talk, I will present Chromoformer, a transformer-based architecture for the quantitative modeling of the relationship between histone codes and gene expression. I will first describe how the biological concepts were assimilated to the proposed hierarchical model architectures based on transformer encoders, and then will share how the interpretation of trained models could lead to the biological insights.

라인 강연 주소

온라인 강연 주소

※ 인원 초과시 유튜브 사용 부탁드립니다.

  • ZOOM 웨비나 접속
  • ID : 844 9590 4468
    PW : 286816

  • YOUTUBE 생중계
  • @서울대학교 AI 연구원

2022 AI
여름학교 참관기

2022 AI
여름학교 참관기

AI 여름학교 기간동안 훌륭한 참관기를 작성한 분들께 경품행사가 있습니다.