DeepASMR ๐Ÿ˜ด

Contents

Introduction

ASMR(Autonomous Sensory Meridian Response)์€ ์‹œ๊ฐ, ์ฒญ๊ฐ, ์ด‰๊ฐ ๋“ฑ์˜ ๊ฒฝ๋กœ๋กœ ์‹ฌ๋ฆฌ์  ์•ˆ์ •์ด๋‚˜ ์พŒ๊ฐ์„ ์œ ๋ฐœํ•œ๋‹ค๊ณ  ์•Œ๋ ค์ ธ ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์˜ ๋ชฉ์ ์€ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์„ ์ด์šฉํ•˜์—ฌ ์ฒญ์ทจ์ž์—๊ฒŒ ์ •์„œ์  ์•ˆ์ •๊ฐ์„ ์ฃผ๋Š” ASMR์„ ์ œ์ž‘ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œํ•œ๋‹ค. ์ตœ๊ทผ ASMR์ด ๋ณธ๊ฒฉ์ ์œผ๋กœ ์ธ๊ธฐ๋ฅผ ์–ป์œผ๋ฉด์„œ ASMR ํšจ๊ณผ์— ๋Œ€ํ•œ ๊ณผํ•™์  ์‹คํ—˜์— ๋Œ€ํ•œ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๊ฐ€ ๋ฐœํ‘œ๋˜๊ณ  ์žˆ์œผ๋ฉฐ ์†Œ์…œ๋ฏธ๋””์–ด ๋“ฑ์—์„œ ASMR์„ ๊ฒฝํ—˜ํ•œ ์‚ฌ์šฉ์ž๋“ค์˜ ๊ธ์ •์ ์ธ ํ›„๊ธฐ๋ฅผ ์–ด๋ ต์ง€ ์•Š๊ฒŒ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ASMR์ด ์ฃผ๋Š” ์•ˆ์ •๊ฐ, ์ˆ˜๋ฉด ์œ ๋„, ์ง‘์ค‘๋ ฅ ํ–ฅ์ƒ ๋“ฑ์˜ ํšจ๊ณผ์— ์ง‘์ค‘ํ•˜๊ณ  ์‚ฌ์šฉ์ž์˜ ์„ ํ˜ธ๋„๋ฅผ ๋ฐ˜์˜ํ•˜์—ฌ ์ƒˆ๋กœ์šด ASMR์„ ์ƒ์„ฑํ•˜๋Š” DeepASMR์„ ์ œ์•ˆํ•œ๋‹ค.

Abstraction

Raw audio format ์ค‘ Waveform๊ณผ MIDI data๋ฅผ ๋น„๊ตํ•˜๊ณ  ์ถ”๊ฐ€๋กœ Waveform์—์„œ ์ ์šฉ ๊ฐ€๋Šฅํ•œ spectrogram๊ณผ melspectrogram์˜ ์ฐจ์ด๋ฅผ ๋ถ„์„ํ•˜๋ฉฐ audio preprocessing์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ๋˜ํ•œ background noise๊ฐ€ ํฌํ•จ๋œ ASMR์„ Generative model(VAE, VQ-VAE, Wavenet, Melnet)์„ ํ†ตํ•ด ํ•™์Šตํ•˜๊ณ  ๊ฐ ๋ชจ๋ธ๋ณ„๋กœ ์ƒ์„ฑ๋œ ASMR sound ํŠน์ง•๊ณผ ์ •์„ฑ์  ๋ถ„์„์„ ํ†ตํ•ด Audio signal์„ ์—ฐ๊ตฌํ•˜์˜€๋‹ค.

Member

์ด์ƒํ›ˆ
์ „์ง„ํ™˜
๋ฐ•์€์˜

Music Generation Project

  • Music Information Retrieval
  • ์Œ์› ์ •๋ณด ๋ณต์›(Music information retrival) ์€ ์Œ์›์œผ๋กœ๋ถ€ํ„ฐ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” ๋‹ค์–‘ํ•œ ํ•™๋ฌธ์— ๊ฑธ์นœ ๋ถ„์•ผ๋กœ ์•Œ๋ ค์ ธ ์žˆ๋‹ค. MIR์€ ํ˜„์‹ค ์„ธ๊ณ„์— ๋‹ค์–‘ํ•˜๊ฒŒ ์ ์šฉ๋˜๋ฉฐ, ์Œ์•…ํ•™(musicology), ์Œํ–ฅ์‹ฌ๋ฆฌํ•™(psychoacoustics), ์‹ฌ๋ฆฌํ•™, ์‹ ํ˜ธ์ฒ˜๋ฆฌ(signal proecssing), ์ •๋ณดํ•™(informatics), ํŠนํžˆ ๊ธฐ๊ณ„ํ•™์Šต ๋ถ„์•ผ์—์„œ ๋ถ„๋ฅ˜(Classification) ๋ฐ ์ƒ์„ฑ(Generation) ๋“ฑ์˜ ํ˜•ํƒœ๋กœ ๋‹ค์–‘ํ•˜๊ฒŒ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ๊ทธ ์ค‘์—์„œ๋„ ์ž๋™ ์Œ์› ์ƒ์„ฑ(automatic music generation) ์˜ ๋ถ„์•ผ๋Š” ๋งŽ์€ MIR ์—ฐ๊ตฌ์ž๋“ค์—๊ฒŒ ๋„์ „์ ์ธ ์ฃผ์ œ์ด๋ฉฐ ํ˜„์žฌ๊นŒ์ง€ ์ด๋Ÿฌํ•œ ์‹œ๋„๋“ค์€ ์ œํ•œ๋œ ์„ฑ๊ณต์œผ๋กœ ๋‚จ์•„์žˆ๋‹ค.

  • AI Music Generation
  • ๋ถˆ๊ณผ ์–ผ๋งˆ์ „๋งŒ ํ•˜๋”๋ผ๋„ ์ธ๊ณต์ง€๋Šฅ์˜ ์ฐฝ์กฐ์  ๋Šฅ๋ ฅ์— ๋Œ€ํ•œ ์˜๊ตฌ์‹ฌ์ด ์ œ๊ธฐ๋๋‹ค. ์ธ๊ฐ„๊ณผ ๊ธฐ๊ณ„๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” โ€˜๊ฐ์ •โ€™์€ ์˜ˆ์ˆ  ๋ถ„์•ผ์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์š”์†Œ์ด๊ณ  ๊ธฐ๊ณ„๋Š” ์ธ๊ฐ„ ๊ณ ์œ ์˜ ์˜์—ญ์ธ ๊ฐ์ •์„ ์ดํ•ดํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ์˜๊ฒฌ์ด ๋งŽ์•˜๋‹ค. ํ•˜์ง€๋งŒ ์ตœ๊ทผ ์Œ์•… ๋ถ„์•ผ์—์„œ ๊ธ€๋กœ๋ฒŒ IT ๊ธฐ์—…์„ ํ•„๋‘๋กœ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•ด ์Œ์•… ์ž‘๊ณก์— ๋„์ „ํ•˜๋Š” ์ธ๊ณต์ง€๋Šฅ ํ”„๋กœ์ ํŠธ์˜ ์„ฑ๊ณผ๋Š” ๋†€๋ผ์›€์„ ์ฃผ๊ณ  ์žˆ์œผ๋ฉฐ โ€œ์ธ๊ณต์ง€๋Šฅ(AI)์ด ์ƒ์‚ฐํ•˜๋Š” ์Œ์•…์˜ ์ง„ํ–‰ ์†๋„๋ฅผ ๊ณ ๋ คํ•  ๋•Œ, 10๋…„ ์•ˆ์— ์ˆ˜์ž‘์—…์œผ๋กœ ์ž‘๊ณกํ•˜๋Š” ๊ฒƒ์€ ๊ตฌ์‹์ด ๋  ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค.โ€ ๋ผ๋Š” ์ „๋ฌธ๊ฐ€์˜ ์˜ˆ์ธก๋„ ์žˆ๋‹ค.

    OpenAI์˜ MuseNet(2019), Jukebox(2020) ๊ทธ๋ฆฌ๊ณ  Google์˜ Magenta(2017) ํ”„๋กœ์ ํŠธ๋Š” ์ด๋Ÿฌํ•œ ๋”ฅ๋Ÿฌ๋‹ ๋ฐฉ์‹์„ ๊ฐ€์žฅ ์ž˜ ํ™œ์šฉํ•œ ๋Œ€ํ‘œ์ ์ธ ์Œ์•… ์ธ๊ณต์ง€๋Šฅ์ด๋‹ค. MuseNet์€ GPT-2, Sparse Transformer ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งŽ์€ ์–‘์˜ MIDI ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜์˜€๊ณ  10๊ฐœ ์•…๊ธฐ๋ฅผ ์‚ฌ์šฉํ•ด ์ƒˆ๋กœ์šด ์Œ์•…์„ ๋งŒ๋“ค์–ด ๋‚ธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  Jukebox๋Š” VQ-VAE-2 ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ Waveform์˜ Long Range Structure์™€ High diversity๋ฅผ ์ดํ•ดํ•˜์—ฌ ์›๋ณธ๊ณผ ์ƒ๋‹นํžˆ ์œ ์‚ฌํ•œ ์‚ฌ์šด๋“œ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค.

    ๋งˆ์ง€๋ง‰์œผ๋กœ Google์˜ Magenta๋Š” ์Œ์•… ์‹œํ€€์Šค ๋ฐ์ดํ„ฐ์˜ ์Œ์•…์  ํŠน์„ฑ์„ ๋‹ด์•„ ์š”์•ฝ๋œ ์ž ์žฌ ๋ฒกํ„ฐ๋กœ ์ธ์ฝ”๋”ฉํ•˜๊ณ  ๊ทธ ํ›„์— ๋‹ค์‹œ ์Œ์•… ์‹œํ€€์Šค๋กœ ๋””์ฝ”๋”ฉํ•˜๋Š” MusicVAE ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์ด๋Ÿฌํ•œ ์ธ๊ณต์ง€๋Šฅ์ด ๋งŒ๋“ค์–ด๋‚ธ ์Œ์•…์€ ์•„์ง ์ธ๊ฐ„์ด ๋งŒ๋“  ์Œ์•…๊ณผ๋Š” ๊ตฌ๋ถ„์ด ๋˜๋Š” ํŽธ์ด๊ณ  ๊ธฐ์กด ์Œ์•…๋ณด๋‹ค ๋›ฐ์–ด๋‚˜๋‹ค๊ณ  ๋ณผ ์ˆ˜๋Š” ์—†๋‹ค. ํ•˜์ง€๋งŒ ์‹œ๊ฐ„์ด ํ๋ฅด๊ณ  ์ ์ฐจ ๋งŽ์€ ์Œ์•… ๊ด€๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜๊ณ  ์ปดํ“จํŒ…๊ณผ ํ•˜๋“œ์›จ์–ด์˜ ๋ฐœ์ „์ด ๋™๋ฐ˜๋˜๋ฉด ์ธ๊ณต์ง€๋Šฅ์ด ์–ด๋– ํ•œ ์Œ์•…์„ ๋งŒ๋“ค์–ด ๋‚ผ ์ˆ˜ ์žˆ์„์ง€ ๊ฐ€๋Š ํ•˜๊ธฐ ์–ด๋ ต๋‹ค.


Prior work

  • Deep Learning based ASMR
  • ๋”ฅ๋Ÿฌ๋‹์„ ์ด์šฉํ•˜์—ฌ ๊ธฐ์กด ASMR ์Œ์›๋“ค์„ ๋ชจ์œผ๊ณ  ๋ถ„๋ฅ˜ํ•˜๋ฉฐ ์‚ฌ์šฉ์ž์˜ ์„ ํ˜ธ๋„๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ƒˆ๋กœ์šด ASMR ์Œ์›์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ํ”Œ๋žซํผ์ธ DeepASMR์„ ์ œ์•ˆํ•œ๋‹ค. DeepASMR์€ ASMR ์Œ์› ๋ถ„๋ฅ˜ ๋ฐ ์ธ์‹์„ ์œ„ํ•ด ๊ธฐ์กด์˜ ์Œ์•… ์ธ์‹์ด๋‚˜ ์†Œ์Œ ์ธ์‹์„ ์œ„ํ•œ DNN๋ณด๋‹ค ๊ฐœ์„ ๋œ DNN ๋ชจ๋ธ๋“ค์„ ๊ตฌ์ถ•ํ•˜์—ฌ ๋ถ„๋ฅ˜์˜ ์ •ํ™•๋„๋ฅผ 95% ์ด์ƒ๊นŒ์ง€ ๋†’์˜€๋‹ค.

    .

    DNN์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ธฐ์กด ASMR ์Œ์›๋“ค์„ ๋ณ€ํ˜•ํ•˜๊ฑฐ๋‚˜ ํ•ฉ์„ฑํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ƒˆ๋กœ์šด ASMR ์Œ์›์„ ์ƒ์‚ฐํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด VAE(Variational Autoencoder) ๋ฐ GAN(Generative Adversarial Network) ๋ฐฉ์‹์„ ์ด์šฉํ•˜์—ฌ ASMR ์Œ์› ์ƒ์„ฑ DNN ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜์˜€๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋œ ASMR ์Œ์›๋“ค์„ ์šฐ๋ฆฌ์˜ ๋ถ„๋ฅ˜ DNN ๋ชจ๋ธ์— ์ž…๋ ฅํ•˜์—ฌ ๊ทธ ์ •ํ™•์„ฑ์„ ๊ฒ€์ฆํ•œ ๊ฒฐ๊ณผ, 70% ์ด์ƒ์˜ ์ •ํ™•๋„๋ฅผ ๋ณด์—ฌ ์ œ์•ˆํ•˜๋Š” DNN ๋ชจ๋ธ์ด ์–‘์งˆ์˜ ASMR ์Œ์›๋“ค์„ ์ƒ์„ฑํ•˜์˜€์Œ์„ ์‹œ์‚ฌํ•œ๋‹ค.

  • ASMR Generation Demo

MIDI and Waveform

    WAV

  • Waveform Audio File Format
  • Waveform์€ ms(millisecond, 1000์˜ 1์ดˆ)๋ฅผ 2์ฐจ์› ์ด๋ฏธ์ง€๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์‹œ๊ฐ„ ์ถ•(Time)์— ํ•ด๋‹นํ•˜๋Š” ๋ถ€๋ถ„์ด ๊ธธ์–ด์ง€๋ฏ€๋กœ(Length) ์šฉ๋Ÿ‰์ด ์ปค์ง€๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค.
  • WAV๋Š” ๋น„ ์••์ถ•ํŒŒ์ผ๋กœ ๋‹ค๋ฅธ ํฌ๋งท๊ณผ ๋น„๊ตํ•ด์„œ ๊ณ ์šฉ๋Ÿ‰ ํŒŒ์ผ์ด์ง€๋งŒ ๋‹จ์ˆœ์„ฑ๊ณผ ํ’ˆ์งˆ๋ฉด์—์„œ ์œ ๋ฆฌํ•˜๋‹ค
  • ๋น„ ์••์ถ• ํŒŒ์ผ์ด๋ฏ€๋กœ ๋งŽ์€ ๋ถˆํ•„์š”ํ•œ ๊ณต๊ฐ„์„ ์ฐจ์ง€ํ•˜๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค.

    MIDI

  • Musical Instrument Digital Interface
  • MIDI๋Š” ์•…๊ธฐ์˜ ์Œํ‘œ, ์†Œ๋ฆฌ์˜ ํฌ๊ธฐ๋“ฑ์„ ์ €์žฅํ•œ ํŒŒ์ผ์ด๋ฉฐ ๋‹ค์–‘ํ•œ ์•…๊ธฐ ์ •๋ณด๊ฐ€ ๋‹ด๊ธธ ์ˆ˜ ์žˆ๋‹ค.
  • MIDI๋Š” note์™€ duration์ด ๊ธฐ๋ก๋œ 3๊ฐœ์˜ 8 bytes packet์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ waveform๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ ์šฉ๋Ÿ‰์ด compact ํ•˜๋‹ค.
  • ๋ฐ์ดํ„ฐ๊ฐ€ wave์— ๋น„ํ•ด ๊ตฌํ•˜๊ธฐ ์–ด๋ ค์šฐ๋ฉฐ ์Œ์•…์  ๋„๋ฉ”์ธ์ด ์—†๋‹ค๋ฉด ๋‹ค๋ฃจ๊ธฐ ํž˜๋“ค๋‹ค.

Audio Processing

  • Waveform
  • Waveform

    ์†Œ๋ฆฌ๋Š” ์ง„๋™์œผ๋กœ ์ธํ•œ ๊ณต๊ธฐ์˜ ์••์ถ•์œผ๋กœ ์ƒ์„ฑ๋˜๋ฉฐ ์••์ถ•์ด ์–ผ๋งˆ๋‚˜ ๋๋А๋ƒ์— ๋”ฐ๋ผ ์ง„๋™ํ•˜๋ฉฐ ๊ณต๊ฐ„์ด๋‚˜ ๋งค์งˆ์„ ์ „ํŒŒํ•ด ๋‚˜๊ฐ€๋Š” ํ˜„์ƒ์ธ Wave(ํŒŒ๋™)์œผ๋กœ ํ‘œํ˜„
    Waveform ์—์„œ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์ •๋ณด
    ์œ„์ƒ (Phase; Degress of displacement)
    ์ง„ํญ (Amplitude; Intensity)
    ์ฃผํŒŒ์ˆ˜ (Frequency)

    Fourier Transform

    ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜์€ ์ž„์˜์˜ ์ž…๋ ฅ ์‹ ํ˜ธ๋ฅผ ๋‹ค์–‘ํ•œ ์ฃผํŒŒ์ˆ˜๋ฅผ ๊ฐ–๋Š” ์ฃผ๊ธฐ ํ•จ์ˆ˜๋“ค์˜ ํ•ฉ์œผ๋กœ ๋ถ„ํ•ดํ•˜์—ฌ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.
    f ๊ฐ€ ํ‘ธ๋ฆฌ์— ๊ธ‰์ˆ˜๋กœ ๋ถ„ํ•ด๋˜์–ด ํŒŒ๋ž€์ƒ‰์œผ๋กœ ํ‘œ์‹œ๋œ๋‹ค. ์ด ์‚ฌ์ธํŒŒ๋“ค์„ ์ฃผํŒŒ์ˆ˜์— ๋”ฐ๋ผ ๋‚˜์—ดํ•˜๋ฉด ์˜์ƒ์˜ ํ›„๋ฐ˜๋ถ€์— ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒƒ ์ฒ˜๋Ÿผ ๋””๋ž™ ๋ธํƒ€ ํ•จ์ˆ˜(Dirac delta function)์˜ ๊ผด๋กœ ํ‘œ์‹œ๋œ๋‹ค. ์ด ๋•Œ ์ฃผํŒŒ์ˆ˜ ์˜์—ญ์—์„œ์˜ ํ•จ์ˆ˜๋ฅผ fฬ‚ ๋กœ ํ‘œ์‹œํ•œ๋‹ค.
  • Spectrogram
  • Magnitude spectrogram of a piano recording

    ์†Œ๋ฆฌ์˜ ๋””์ง€ํ„ธ ํ‘œํ˜„์€ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ๋กœ ์ œ๊ณต๋œ๋‹ค. ์‚ฌ์šด๋“œ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ๋ณ€ํ™”ํ•˜๋Š” ํŒŒํ˜•์˜ ๋ชจ์–‘์„ ์ธ์ฝ”๋”ฉํ•˜์—ฌ ์ €์žฅ๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Waveform์€ ๊ทธ ์ž์ฒด๋กœ ๋ถ„์„ํ•˜๊ธฐ๋Š” ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— Waveform์˜ ์ค‘์ฒฉ๋œ Window์— ๋Œ€ํ•œ ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜์„ ๊ณ„์‚ฐํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ํ–‰๋ ฌ๋กœ ํ‘œํ˜„ํ•˜์—ฌ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์œผ๋กœ ์‹œ๊ฐํ™” ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ Waveform ์— ๋น„ํ•ด ์‹œ๊ฐ„ ๊ฒฝ๊ณผ์— ๋”ฐ๋ฅธ ํŒŒํ˜•์˜ Local Frequency contents๋ฅผ ์‰ฝ๊ฒŒ ํ™•์ธ ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค.

    ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์€ ๋ณต์†Œ์ˆ˜ ๊ฐ’์ด๋ฉฐ. ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์€ ์‹œ๊ฐ„ ๊ฒฝ๊ณผ์— ๋”ฐ๋ผ ๊ฐ ์‹œ์ ์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ์ฃผํŒŒ์ˆ˜ ์„ฑ๋ถ„์˜ ์ง„ํญ๊ณผ ์œ„์ƒ์„ ๋ชจ๋‘ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์œ„์˜ ๊ทธ๋ฆผ์€ Magnitude ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์ด๋ฉฐ ์•„๋ž˜ ๊ทธ๋ฆผ์€ phase ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์„ ์‹œ๊ฐํ™”ํ•œ ๊ฒƒ์ด๋‹ค. ์•„๋ž˜์˜ Phase Spectrogram์˜ ๊ฒฝ์šฐ Local Frequency Contents๋ฅผ ์œก์•ˆ์œผ๋กœ ํ™•์ธํ•˜๊ธฐ ์–ด๋ ค์šฐ๋ฉฐ ๊ฐ’์ด ๋žœ๋คํ•˜๊ฒŒ ๋ถ„ํฌ๋ผ์žˆ๋Š” ๋ฐ˜๋ฉด Magnitude ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์€ ์‹œ๊ฐ„ ๊ฒฝ๊ณผ์— ๋”ฐ๋ฅธ Local structure๋ฅผ ๋šœ๋ ทํ•˜๊ฒŒ ํ‘œํ˜„ํ•˜๊ณ  ์žˆ๋‹ค.

    the corresponding phase spectrogram

    ์ด์ฒ˜๋Ÿผ ์˜ค๋””์˜ค ์‹ ํ˜ธ์—์„œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•  ๋•Œ ์œ ์ตํ•œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด phase component๋ฅผ ๋ฒ„๋ฆฌ๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. ์‚ฌ์‹ค ์ด๊ฒƒ์ด ๋ฐ”๋กœ magnitude spectrogram์„ ๊ฐ„๋‹จํžˆ "spectrogram"์œผ๋กœ ์–ธ๊ธ‰ํ•˜๋Š” ์ด์œ ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์†Œ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•  ๋•Œ phase ์ •๋ณด๋Š” ์†Œ๋ฆฌ ์ธ์‹์— ์˜๋ฏธ ์žˆ๋Š” ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ธฐ ๋•Œ๋ฌธ์— ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ์œ„์ƒ ์ •๋ณด์˜ ์ž์„ธํ•œ ํ™œ์šฉ์€ ์•„๋ž˜ ๋งํฌ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

    More Detail
  • Mel Filter
  • Mel Scale

    Mel-filter์˜ ๊ธฐ๋ณธ ์•„์ด๋””์–ด๋Š” ์‚ฌ๋žŒ์˜ ์ฒญ๋ ฅ์€ 1000Hz ์ด์ƒ์˜ frequency์— ๋Œ€ํ•ด์„œ๋Š” ๋œ ๋ฏผ๊ฐํ•˜๋ฏ€๋กœ 1000Hz๊นŒ์ง€๋Š” Linearํ•˜๊ฒŒ ๊ทธ ์ด์ƒ์€ Log scale๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

    Hertz scale์€ ์‚ฌ๋žŒ์ด ๋ฐ›์•„๋“ค์ด๋Š” ๋ฏผ๊ฐ๋„๋‚˜ ๊ตฌ๋ถ„์ ์„ ์ž˜ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด 5000Hz ์™€ 8000Hz๋ฅผ ๋“ค์–ด๋„ ์‚ฌ๋žŒ์€ 3000 ์ •๋„์˜ ์ฃผํŒŒ์ˆ˜๊ฐ€ ์ฐจ์ด๋‚˜๋Š” ๊ฒƒ์„ ์ธ์ง€ํ•˜์ง€ ๋ชปํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋ฅผ ์‚ฌ๋žŒ์ด ์‰ฝ๊ฒŒ ์ธ์‹ ํ•  ์ˆ˜ ์žˆ๋Š” scale๋กœ ๋ณ€ํ™˜ํ•ด์ค€ ๊ฒƒ์ด Mel scale์ด๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ 1000Hz๊นŒ์ง€๋Š” Linearํ•˜๊ฒŒ ๋ณ€ํ™˜ํ•˜๋‹ค๊ฐ€ ๊ทธ ์ดํ›„๋กœ๋Š” Mel scale triangular filter๋ฅผ ๋งŒ๋“ค์–ด ๊ณฑํ•ด์ค€๋‹ค.

    ๋ณดํ†ต 26๊ฐœ ํ˜น์€ 40๊ฐœ ์ •๋„์˜ filter bank๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๊ฐ Filter Bank ์˜์—ญ๋Œ€ ๋งˆ๋‹ค Energy๊ฐ’(spectrum power๊ฐ’ ํ‰๊ท )์„ ๋ชจ๋‘ ํ•ฉํ•˜๊ณ  log๋ฅผ ์ทจํ•ด์ค€๋‹ค. ์ด๋ ‡๊ฒŒ ์ฃผํŒŒ์ˆ˜ ์˜์—ญ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ amplitude ์˜์—ญ์—์„œ๋„ log scaling์„ ํ•ด์ฃผ๋Š” ์ด์œ ๋Š” ์‚ฌ๋žŒ์ด ์ฃผํŒŒ์ˆ˜ ์˜์—ญ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ amplitude ์˜์—ญ์—์„œ๋„ log scale๋กœ ๋ฐ˜์‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๊ฒฐ๊ณผ์ ์œผ๋กœ filter bank ๊ฐœ์ˆ˜๋งŒํผ์˜ Mel scale bin ๋ณ„๋กœ log power ๊ฐ’๋“ค์ด ๊ตฌํ•ด์ง„๋‹ค.

    ๋‹ค์‹œ frequency ๊ด€์ ์—์„œ Mel-scale์€ ๋‚ฎ์€ ์ฃผํŒŒ์ˆ˜์—์„œ ๋” ์ž˜ ๊ตฌ๋ณ„ํ•˜๊ณ  ๋” ๋†’์€ ์ฃผํŒŒ์ˆ˜์—์„œ ๋œ ๊ตฌ๋ณ„ํ•จ์œผ๋กœ์จ ์†Œ๋ฆฌ์— ๋Œ€ํ•œ ์ธ๊ฐ„์˜ ๋น„์„ ํ˜• ๊ท€ ์ธ์‹์„ ๋ชจ๋ฐฉํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ๋‹ค์Œ ๋ฐฉ์ •์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ํ—ค๋ฅด์ธ (f)์™€ ๋ฉœ(m) ์‚ฌ์ด๋ฅผ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. melspectrogram์€ frequency๊ฐ€ mel scale๋กœ ๋ณ€ํ™˜๋˜๋Š” spectrogram์ด๋‹ค.

  • Reconstruction
  • Inverting the spectrogram

    Inverting the mel spectrogram


ASMR Project

  • Architecture
  • ASMR feature extraction
  • ASMR (Fire) sound

    Human voice sound


Generative Model

  • Variational AutoEncoder (VAE)
  • Variational AutoEncoder

    • VAE๋Š” Training data์™€ ๊ฐ™์€ ๋ถ„ํฌ๋ฅผ ๊ฐ€์ง€๋Š” sample ๋ถ„ํฌ์—์„œ sampling์„ ํ•˜์—ฌ ์ƒˆ๋กœ์šด ๊ฒƒ์„ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค.
    • ์ฃผ์–ด์ง„ Training data์˜ ๋ถ„ํฌ๋ฅผ p_data(x)๋ผ ํ•  ๋•Œ sample ๋ชจ๋ธ p_model(x) ์—ญ์‹œ ๊ฐ™์€ ๋ถ„ํฌ๋ฅผ ๊ฐ€์ง„๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค.
    • p_model(x)๋ฅผ ํ†ตํ•ด inferenceํ•˜๋ฉด ์ƒˆ๋กœ์šด x๊ฐ€ ์ƒ์„ฑ๋œ๋‹ค.
    • Encoder ํ•จ์ˆ˜์˜ output์€ latent variable์˜ ๋ถ„ํฌ์˜ ๐œ‡ ์™€ ๐œŽ ๋ฅผ ๋‚ด๊ณ , ์ด output๊ฐ’์„ ํ‘œํ˜„ํ•˜๋Š” ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๋ฅผ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.
    • Samplingํ•œ ๊ฐ’๋“ค์„ backpropagation ํ•˜๊ธฐ ์œ„ํ•ด reparmeterization trick์„ ์‚ฌ์šฉํ•œ๋‹ค.
    • Decoder๋Š” Sampling๋œ z๊ฐ’์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ Encoder์˜ Input์˜ shape์œผ๋กœ reconstruction ํ•˜๋Š” layer์ด๋‹ค.
    More Detail
  • VAE Animation
  • VAE Latent Space


    VAE Result

    Originals Rain ASMR

    Generated Rain ASMR


  • Vector-Quantized Variational Autoencoders (VQ-VAE)
  • VQ-VAE Structure

    • VAE์™€ Discrete Representation์„ ๊ฒฐํ•ฉํ•œ ์ƒˆ๋กœ์šด ์ƒ์„ฑ๋ชจ๋ธ(VQ-VAE)์„ ์ œ์•ˆํ•œ๋‹ค
    • Encoder์˜ output์„ Discrete Representation Space๋กœ Mappingํ•˜๋Š” Vector Quantization Layer๋ฅผ ๋„์ž…ํ•˜์˜€๋‹ค.
    • Vector Quantisation(VQ)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋„ˆ๋ฌด ํฐ ๋ถ„์‚ฐ์œผ๋กœ ์ƒ๊ธฐ๋Š” ์–ด๋ ค์›€์„ ํ”ผํ•˜๋ฉด์„œ ํ•™์Šตํ•˜๊ธฐ ํŽธํ•˜๊ณ  Posterior Collapse ๋ฌธ์ œ๋ฅผ ํšŒํ”ผํ•  ์ˆ˜ ์žˆ๋‹ค.
    • ์—ฐ์†ํ‘œํ˜„์„ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ๊ณผ ๋น„๋“ฑํ•˜๋ฉด์„œ๋„ ์ด์‚ฐํ‘œํ˜„์˜ ์œ ์—ฐํ•จ์„ ์ œ๊ณตํ•œ๋‹ค.
    • VQ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ชจ๋ธ์ด ๊ฐ•๋ ฅํ•œ autoregressive decoder์™€ ์ง์„ ์ด๋ฃฐ ๋•Œ latent๋“ค์ด ๋ฌด์‹œ๋˜๋Š” โ€œPosterior Collapseโ€ ๋ฌธ์ œ๋ฅผ ํ”ผํ•  ์ˆ˜ ์žˆ๋‹ค.
    • ์ด์–ด์„œ ๋‚˜์˜ฌ Wavenet ๋ชจ๋ธ์ด decoder๋กœ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค.
    More Detail

    Vector Quantization

    Discrete Spaces

    Two-Dimensional Vector Quantization Animation


    Wavenet

    Wavenet

    The model predicts the distribution of potential signal values for each timestep, given past signal values.
    • Wavenet์€ Audio waveform์„ Generate ํ•˜๋Š” Auto regressive Model์ด๋‹ค.
    • waveform์„ ๊ฒฐํ•ฉํ™•๋ฅ ๋ถ„ํฌ๋กœ ํ‘œํ˜„ํ•˜๊ณ , ์ด๋ฅผ Convolution layer๋ฅผ ์Œ“์•„์„œ ๋ชจ๋ธ๋งํ•œ๋‹ค.
    • 1D convolution์œผ๋กœ Time-series data๋ฅผ causalํ•˜๊ฒŒ Mappingํ•˜์—ฌ waveform์„ ์ƒ์„ฑํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜
    • waveform์˜ ๊ฒฝ์šฐ ์ด์ „ ์ƒํƒœ์˜ ์ •๋ณด๊ฐ€ ์ค‘์š”ํ•˜๋ฏ€๋กœ dilated convolution์œผ๋กœ Receptive Field๋ฅผ ๋„“ํžˆ๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค.
    • ๋‚ด๋ถ€์ ์œผ๋กœ Residual block ๋ฐ skip connection์„ ์‚ฌ์šฉํ•œ๋‹ค
    • TTS, ์Œ์•…, voice conversion ๋“ฑ์— ํ™œ์šฉ

    VQ-VQE With Wavenet

    Reconstruction

    ์ด ์ƒ˜ํ”Œ๋“ค์€ VQ-VAE ๋ชจ๋ธ๋กœ ์ž…๋ ฅ๋˜๋Š” ์Œ์›์„ ์ด์‚ฐ์  ์ž ์žฌ ์ฝ”๋“œ๋กœ 64๋ฐฐ ์••์ถ•ํ•œ ๊ฒฐ๊ณผ๋ฌผ์ด๋‹ค. VQ-VAE๋Š” prior ์—†์ด๋„ ํŒŒํ˜• ๊ทธ ์ž์ฒด๋กœ๋ถ€ํ„ฐ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ํ•™์Šต์ด ์ง„ํ–‰๋œ๋‹ค. ๋น„๋ก reconstruction๋œ ํŒŒํ˜•์ด ์›๋ณธ์˜ ๋ชจ์–‘๊ณผ ๋งŽ์ด ๋‹ค๋ฅธ ๋ชจ์Šต์„ ๋ ์ง€๋งŒ ๊ทธ๋ž˜๋„ ๊ฝค ์œ ์‚ฌํ•˜๋‹ค.
    Originals and reconstructions

    Samples from Prior

    ์ด์‚ฐ ์ž ์žฌ๊ณต๊ฐ„์€ ๋งค์šฐ ์••์ถ•๋œ ์ƒ์ง•์  ํ‘œํ˜„ ์†์— ๋ง์˜ ๋‚ด์šฉ๊ณผ ๊ฐ™์€ ์ค‘์š”ํ•œ ํŠน์ง•์„ ์Œ์›์œผ๋กœ๋ถ€ํ„ฐ ์บ์น˜ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ํŠน์„ฑ ๋•๋ถ„์—, ์ด ์ž ์žฌ ๊ณต๊ฐ„์˜ ์‹œ์ž‘ ๋ถ€๋ถ„์— ๋˜ ๋‹ค๋ฅธ Wavenet ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ์œผ๋ฉฐ Long term dependency๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค. ๋งŒ์•ฝ ๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„ํ•˜๋ฉด ํ•˜๋‚˜์˜ ๋ชจ๋ธ์ด ์‹ฌ์ง€์–ด ์ „์ฒ˜๋ฆฌ๋˜์ง€ ์•Š์€ ์Œ์›์œผ๋กœ๋ถ€ํ„ฐ ์ง์ ‘ ์–ธ์–ด ๋ชจ๋ธ์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค.
    Sample Audio

    VQ-VAE Result

    Originals Fire ASMR

    Generated Fire ASMR


    Melnet

    Melnet

    Autoregressive model์€ ๊ฐ„๋‹จํ•œ ๋ถ„ํฌ๋ฅผ ๋ชจ๋ธ๋งํ•˜์—ฌ ๋ณต์žกํ•œ ๊ณ ์ฐจ์› ๋ถ„ํฌ๋ฅผ ์„ค๋ช…ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๋ชจ๋ธ์ด ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์˜ ์š”์†Œ๋ฅผ ์˜ˆ์ธกํ•˜๋„๋ก ํ›ˆ๋ จ๋˜๋Š” ์ด ์ ‘๊ทผ๋ฒ•์€ Image, Text ๋ฐ Waveform์„ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์–‘์‹์— ์„ฑ๊ณต์ ์œผ๋กœ ์ ์šฉ๋˜์—ˆ๋‹ค. ๋ฉœ๋„ท์€ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์˜ ๋ชจ๋ธ๋ง์— ์ด์™€ ๊ฐ™์€ ์ ‘๊ทผ๋ฒ•์„ ์ ์šฉํ•œ๋‹ค. PixelCNN๊ณผ ๊ฐ™์€ Autoregressive ์ด๋ฏธ์ง€ ๋ชจ๋ธ์ด ์ด๋ฏธ์ง€์˜ ๊ณต๊ฐ„ ์น˜์ˆ˜์— ๋Œ€ํ•œ ๋ถ„ํฌ๋ฅผ ํ”ฝ์…€ ๋‹จ์œ„๋กœ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, MelNet์€ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์˜ ์‹œ๊ฐ„๊ณผ ๋นˆ๋„ ์น˜์ˆ˜์— ๋”ฐ๋ฅธ ๋ถ„ํฌ ์š”์†Œ๋ฅผ ์ถ”์ •ํ•œ๋‹ค.

    Melnet์€ ๋‘ ๊ฐ€์ง€ ๋‹ค๋ฅธ ์ž๊ธฐ ํšŒ๊ท€ ์ˆœ์„œ๋ฅผ ์‹คํ—˜ํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ๊ฐ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ ํ”„๋ ˆ์ž„์„ ์ €์ฃผํŒŒ์—์„œ ๊ณ ์ฃผํŒŒ๊นŒ์ง€ ๊ฑฐ์ณ ๋‹ค์Œ ํ”„๋ ˆ์ž„์œผ๋กœ ์ง„ํ–‰ํ•˜๋Š” Time major ordering ์ด๋‹ค. ๋‘ ๋ฒˆ์งธ๋Š” Multiscale ordering์ด๋‹ค.

    Time major ordering

    Upsampling

    Multiscale Modelling

    ์ž๊ธฐ ํšŒ๊ท€ ๋ชจ๋ธ์˜ ํ•œ ๊ฐ€์ง€ ๋‹จ์ ์€ Global Structure๋ณด๋‹ค Local Structure๋ฅผ ํ›จ์”ฌ ์ž˜ ํ•™์Šตํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. standard row-major ordering ์ˆœ์„œ๋กœ ์ด๋ฏธ์ง€๋ฅผ ๋ชจ๋ธ๋งํ•  ๋•Œ, ์ด๋Š” ํ˜„์‹ค์ ์ธ ์งˆ๊ฐ์„ ๊ฐ€์ง€์ง€๋งŒ ๊ฐ์ฒด ๋ ˆ๋ฒจ์—์„œ ์ผ๊ด€์„ฑ ์žˆ๋Š” ๋†’์€ ์ˆ˜์ค€์˜ ๊ตฌ์กฐ์™€ ๋” ๋†’์€ ์ˆ˜์ค€์˜ ์žฅ๋ฉด ๊ตฌ์„ฑ์ด ๋ถ€์กฑํ•œ ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋Š” ๊ณ ์ฐจ์› ๋ถ„ํฌ๋ฅผ ๋ชจ๋ธ๋งํ•  ๋•Œ ํŠนํžˆ ๋‘๋“œ๋Ÿฌ์ง„๋‹ค. ์šฐ๋ฆฌ๋Š” ์ˆ˜์‹ญ๋งŒ ์ฐจ์›์œผ๋กœ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์„ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋Ÿฌํ•œ ํšจ๊ณผ์— ๋Œ€์‘ํ•˜๊ธฐ ์œ„ํ•œ ๋Œ€์ฑ…์„ ๋„์ž…ํ•˜๋Š” ๊ฒƒ์ด ํ•„์ˆ˜์ ์ด๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋Œ€๋žต์ ์ธ ์ˆœ์„œ์— ๋”ฐ๋ผ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฉ€ํ‹ฐ์Šค์ผ€์ผ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ๋‹ค. ๋†’์€ ์ˆ˜์ค€์˜ ๊ตฌ์กฐ๋ฅผ ์บก์ฒ˜ํ•˜๋Š” ์ €ํ•ด์ƒ๋„ ํ•˜์œ„ ์ƒ˜ํ”Œ๋ง๋œ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์ด ์ฒ˜์Œ์— ์ƒ์„ฑ๋œ ๋‹ค์Œ ๊ณ ํ•ด์ƒ๋„ ์„ธ๋ถ€ ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐ˜๋ณต์ ์ธ ์—…์ƒ˜ํ”Œ๋ง ์ ˆ์ฐจ๊ฐ€ ์ด์–ด์ง„๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์œผ๋กœ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์„ ์ƒ์„ฑํ•จ์œผ๋กœ์จ ๋กœ์ปฌ ๊ตฌ์กฐ์™€ ๊ธ€๋กœ๋ฒŒ ๊ตฌ์กฐ๋ฅผ ํ•™์Šตํ•˜๋Š” ์ž‘์—…์„ ๋ถ„๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค.


    Melnet result

    ASMR Rain Upsampling