Post

VAE: Variational Autoencoder

VAE: Variational Autoencoder

๐ŸŸฃ Intro

๋Œ€ํ‘œ์ ์ธ generative model์˜ ์ฒซ ๋ฒˆ์งธ ์‹œ๋ฆฌ์ฆˆ์ธ VAE์— ๋Œ€ํ•ด์„œ ์ •๋ฆฌํ•ด๋ณด๋ ค๊ณ  ํ•œ๋‹ค.

โšช VAE (Variational AutoEncoder)๋ž€?

  • VAE๋Š” ํ™•๋ฅ ์  latent space๋ฅผ ํ•™์Šตํ•˜๋Š” ์ƒ์„ฑ๋ชจ๋ธ (Generative Model) ์ด๋‹ค. ๊ธฐ์กด์˜ AutoEncoder๋Š” latent vector๋ฅผ ๋‹จ์ˆœํ•œ ๋ฒกํ„ฐ๋กœ ์••์ถ•ํ•˜์ง€๋งŒ, VAE๋Š” ์ด๋ฅผ ํ™•๋ฅ ๋ถ„ํฌ๋กœ ๋ณธ๋‹ค.

VAE

  • ์•„์ด๋””์–ด: ๋ฐ์ดํ„ฐ๋ฅผ ์ž ์žฌ ๊ณต๊ฐ„(z)์œผ๋กœ ์ธ์ฝ”๋”ฉ ํ›„ ํ™•๋ฅ ์ ์œผ๋กœ ๋ณต์›
  • ์ˆ˜์‹: $p(x, z) = p(x|z)p(z)$
  • ๋ชฉ์ : ELBO (Evidence Lower Bound) ์ตœ์ ํ™”
  • ์žฅ์ : ํ•™์Šต ์•ˆ์ •, ํ™•๋ฅ  ๊ธฐ๋ฐ˜ ์ƒ์„ฑ
  • ๋‹จ์ : ์ด๋ฏธ์ง€๊ฐ€ ํ๋ฆฟํ•  ์ˆ˜ ์žˆ์Œ

์ฒ˜์Œ ๊ณต๋ถ€ํ•  ๋•Œ โ€˜Variationalโ€™์ด๋ผ๋Š” ๋‹จ์–ด๊ฐ€ ์‰ฝ๊ฒŒ ์™€๋‹ฟ์ง€ ์•Š์•„์„œ ๊ณ ์ƒํ–ˆ์—ˆ๋˜ ๋ชจ๋ธ์ด์—ˆ๋‹ค..


โšช ์ˆ˜์‹ ์ •๋ฆฌ

VAE์˜ ๋ชฉ์ ์€ ๋‹ค์Œ์˜ evidence lower bound (ELBO)๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๊ฒƒ:

\[\log p(x) \ge \mathbb{E}_{q(z|x)}[\log p(x|z)] - D_{KL}(q(z|x) \| p(z))\]
  • ์ฒซ ํ•ญ: reconstruction term
  • ๋‘ ๋ฒˆ์งธ ํ•ญ: latent ๋ถ„ํฌ๋ฅผ ์ •๊ทœ ๋ถ„ํฌ์— ๊ฐ€๊น๊ฒŒ ์œ ๋„

โšช ๊ตฌ์กฐ ์š”์•ฝ

  • Encoder: ์ž…๋ ฅ ( x ) โ†’ latent ๋ถ„ํฌ ( z \sim \mathcal{N}(\mu, \sigma^2) )
  • Decoder: ( z ) โ†’ ๋ณต์›๋œ ์ด๋ฏธ์ง€ ( \hat{x} )
  • Loss:
    1. ๋ณต์› ์†์‹ค: ( |x - \hat{x}|^2 )
    2. ๋ถ„ํฌ regularization (KL divergence): ( D_{KL}(q(zx) | p(z)) )

โšช ํŠน์ง•

ํ•ญ๋ชฉ์„ค๋ช…
์žฅ์ Latent space๊ฐ€ ์—ฐ์†์ , sampling ๊ฐ€๋Šฅ
๋‹จ์ ์ด๋ฏธ์ง€ ํ’ˆ์งˆ์ด blurryํ•จ (pixel-wise loss ๋•Œ๋ฌธ)
์‘์šฉImage generation, anomaly detection, disentanglement

๐ŸŸฃ ๋งˆ์น˜๋ฉฐ

VAE์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ์ธ์ฝ”๋”๊ฐ€ ์ด๋ฏธ์ง€๋ฅผ ์ž ์žฌ ๊ณต๊ฐ„์˜ ํŠน์ • ํ•œ ์ (a single point) ์œผ๋กœ ๋งคํ•‘ํ•˜๋Š” ๋Œ€์‹ , ํ™•๋ฅ  ๋ถ„ํฌ(a probability distribution) ๋กœ ๋งคํ•‘ํ•˜๋„๋ก ํ•ด์„œ latent space๊ฐ€ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์ฑ„์›Œ์ง€๋„๋ก ํ–ˆ๋‹ค.

image

์ด๋Ÿฌํ•œ ์•„์ด๋””์–ด๋Š” ๋”ฅ๋Ÿฌ๋‹ ๊ณต๋ถ€์—์„œ generative model์˜ ์ดํ•ด๋ฅผ ๋•๊ณ , ๋˜ํ•œ ํ†ต๊ณ„์  ์—ญ๋Ÿ‰์„ ๋Š˜๋ฆด ์ˆ˜ ์žˆ๋Š” ์œ ์˜๋ฏธํ•œ ๊ณต๋ถ€์ผ ๊ฒƒ์ด๋‹ค.


Reference

This post is licensed under CC BY 4.0 by the author.

ยฉ 2025 Soohyun Jeon โญ

๐ŸŒฑ Mostly to remember, sometimes to understand.