五月天婷亚洲天久久综合网,婷婷丁香五月激情亚洲综合,久久男人精品女人,麻豆91在线播放

  • <center id="8gusu"></center><rt id="8gusu"></rt>
    <menu id="8gusu"><small id="8gusu"></small></menu>
  • <dd id="8gusu"><s id="8gusu"></s></dd>
    樓主: mingdashike22
    324 0

    [電氣工程與系統(tǒng)科學(xué)] 具有自監(jiān)督多感知特征的視聽場景分析 [推廣有獎]

    • 0關(guān)注
    • 3粉絲

    會員

    學(xué)術(shù)權(quán)威

    78%

    還不是VIP/貴賓

    -

    威望
    10
    論壇幣
    10 個
    通用積分
    72.4759
    學(xué)術(shù)水平
    0 點(diǎn)
    熱心指數(shù)
    0 點(diǎn)
    信用等級
    0 點(diǎn)
    經(jīng)驗(yàn)
    24893 點(diǎn)
    帖子
    4117
    精華
    0
    在線時間
    1 小時
    注冊時間
    2022-2-24
    最后登錄
    2022-4-15

    樓主
    mingdashike22 在職認(rèn)證  發(fā)表于 2022-3-9 10:14:00 來自手機(jī) |只看作者 |壇友微信交流群|倒序 |AI寫論文

    +2 論壇幣
    k人 參與回答

    經(jīng)管之家送您一份

    應(yīng)屆畢業(yè)生專屬福利!

    求職就業(yè)群
    趙安豆老師微信:zhaoandou666

    經(jīng)管之家聯(lián)合CDA

    送您一個全額獎學(xué)金名額~ !

    感謝您參與論壇問題回答

    經(jīng)管之家送您兩個論壇幣!

    +2 論壇幣
    摘要翻譯:
    當(dāng)視覺事件和音頻事件同時發(fā)生時,一個彈跳球的重?fù)袈,嘴唇張開時說話的開始--這表明可能有一個共同的潛在事件產(chǎn)生了這兩種信號。在本文中,我們認(rèn)為視頻信號的視覺和音頻分量應(yīng)該使用融合的多感知表示來聯(lián)合建模。我們提出通過訓(xùn)練一個神經(jīng)網(wǎng)絡(luò)來預(yù)測視頻幀和音頻是否在時間上對齊,從而以一種自監(jiān)督的方式學(xué)習(xí)這種表示。我們將學(xué)習(xí)到的表示用于三個應(yīng)用:(a)聲源定位,即在視頻中可視化聲源;(b)視聽動作識別;和(c)屏幕上/屏幕外音頻源分離,例如從外國官員的講話中去除屏幕外翻譯的聲音。代碼、模型和視頻結(jié)果可在我們的網(wǎng)頁上獲得:http://andrewowens.com/multisensory
    ---
    英文標(biāo)題:
    《Audio-Visual Scene Analysis with Self-Supervised Multisensory Features》
    ---
    作者:
    Andrew Owens, Alexei A. Efros
    ---
    最新提交年份:
    2018
    ---
    分類信息:

    一級分類:Computer Science        計(jì)算機(jī)科學(xué)
    二級分類:Computer Vision and Pattern Recognition        計(jì)算機(jī)視覺與模式識別
    分類描述:Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.
    涵蓋圖像處理、計(jì)算機(jī)視覺、模式識別和場景理解。大致包括ACM課程I.2.10、I.4和I.5中的材料。
    --
    一級分類:Computer Science        計(jì)算機(jī)科學(xué)
    二級分類:Sound        聲音
    分類描述:Covers all aspects of computing with sound, and sound as an information channel. Includes models of sound, analysis and synthesis, audio user interfaces, sonification of data, computer music, and sound signal processing. Includes ACM Subject Class H.5.5, and intersects with H.1.2, H.5.1, H.5.2, I.2.7, I.5.4, I.6.3, J.5, K.4.2.
    涵蓋了聲音計(jì)算的各個方面,以及聲音作為一種信息通道。包括聲音模型、分析和合成、音頻用戶界面、數(shù)據(jù)的可聽化、計(jì)算機(jī)音樂和聲音信號處理。包括ACM學(xué)科類H.5.5,并與H.1.2、H.5.1、H.5.2、I.2.7、I.5.4、I.6.3、J.5、K.4.2交叉。
    --
    一級分類:Electrical Engineering and Systems Science        電氣工程與系統(tǒng)科學(xué)
    二級分類:Audio and Speech Processing        音頻和語音處理
    分類描述:Theory and methods for processing signals representing audio, speech, and language, and their applications. This includes analysis, synthesis, enhancement, transformation, classification and interpretation of such signals as well as the design, development, and evaluation of associated signal processing systems. Machine learning and pattern analysis applied to any of the above areas is also welcome.  Specific topics of interest include: auditory modeling and hearing aids; acoustic beamforming and source localization; classification of acoustic scenes; speaker separation; active noise control and echo cancellation; enhancement; de-reverberation; bioacoustics; music signals analysis, synthesis and modification; music information retrieval;  audio for multimedia and joint audio-video processing; spoken and written language modeling, segmentation, tagging, parsing, understanding, and translation; text mining; speech production, perception, and psychoacoustics; speech analysis, synthesis, and perceptual modeling and coding; robust speech recognition; speaker recognition and characterization; deep learning, online learning, and graphical models applied to speech, audio, and language signals; and implementation aspects ranging from system architecture to fast algorithms.
    處理代表音頻、語音和語言的信號的理論和方法及其應(yīng)用。這包括分析、合成、增強(qiáng)、轉(zhuǎn)換、分類和解釋這些信號,以及相關(guān)信號處理系統(tǒng)的設(shè)計(jì)、開發(fā)和評估。機(jī)器學(xué)習(xí)和模式分析應(yīng)用于上述任何領(lǐng)域也是受歡迎的。感興趣的具體主題包括:聽覺建模和助聽器;聲波束形成與聲源定位;聲場景分類;說話人分離;有源噪聲控制和回聲消除;增強(qiáng);去混響;生物聲學(xué);音樂信號的分析、合成與修飾;音樂信息檢索;多媒體音頻和聯(lián)合音視頻處理;口語和書面語建模、切分、標(biāo)注、句法分析、理解和翻譯;文本挖掘;言語產(chǎn)生、感知和心理聲學(xué);語音分析、合成、感知建模和編碼;魯棒語音識別;說話人識別與特征描述;應(yīng)用于語音、音頻和語言信號的深度學(xué)習(xí)、在線學(xué)習(xí)和圖形模型;以及從系統(tǒng)架構(gòu)到快速算法的實(shí)現(xiàn)方面。
    --

    ---
    英文摘要:
      The thud of a bouncing ball, the onset of speech as lips open -- when visual and audio events occur together, it suggests that there might be a common, underlying event that produced both signals. In this paper, we argue that the visual and audio components of a video signal should be modeled jointly using a fused multisensory representation. We propose to learn such a representation in a self-supervised way, by training a neural network to predict whether video frames and audio are temporally aligned. We use this learned representation for three applications: (a) sound source localization, i.e. visualizing the source of sound in a video; (b) audio-visual action recognition; and (c) on/off-screen audio source separation, e.g. removing the off-screen translator's voice from a foreign official's speech. Code, models, and video results are available on our webpage: http://andrewowens.com/multisensory
    ---
    PDF鏈接:
    https://arxiv.org/pdf/1804.03641
    二維碼

    掃碼加我 拉你入群

    請注明:姓名-公司-職位

    以便審核進(jìn)群資格,未注明則拒絕

    關(guān)鍵詞:Presentation localization Applications cancellation Segmentation 音頻 屏幕 模型 獲得 事件

    您需要登錄后才可以回帖 登錄 | 我要注冊

    本版微信群
    加JingGuanBbs
    拉您進(jìn)交流群

    京ICP備16021002-2號 京B2-20170662號 京公網(wǎng)安備 11010802022788號 論壇法律顧問:王進(jìn)律師 知識產(chǎn)權(quán)保護(hù)聲明   免責(zé)及隱私聲明

    GMT+8, 2024-12-23 21:50