Development of Thai Image Captioning Method Using Deep Learning

Please use this identifier to cite or link to this item: http://ithesis-ir.su.ac.th/dspace/handle/123456789/5321

Full metadata record

DC Field	Value	Language
dc.contributor	Witchaphon TIEANCHO	en
dc.contributor	วิชญ์พล เทียนชอ	th
dc.contributor.advisor	SOPON PHUMEECHANYA	en
dc.contributor.advisor	โสภณ ผู้มีจรรยา	th
dc.contributor.other	Silpakorn University	en
dc.date.accessioned	2024-08-13T06:44:52Z	-
dc.date.available	2024-08-13T06:44:52Z	-
dc.date.created	2024
dc.date.issued	28/6/2024
dc.identifier.uri	http://ithesis-ir.su.ac.th/dspace/handle/123456789/5321	-
dc.description.abstract	This thesis designed and developed a deep learning model to create Thai image captions using Convolutional Neural Network (CNN) such as VGG16 and others to extract image features and use Bidirectional LSTM is used to create captions, where CNN is the encoding process and Bidirectional LSTM is the decoding process. Bidirectional LSTM is another type of LSTM that allows the model to learn in two directions. The forward and reverse directions allow the model to learn and distinguish similar words and improve the model's memory capacity. And the dataset used for training and testing includes: The first database is Flickr8k, which is a public database that contains 8091 images and 5 English subtitles, which will be translated into Thai using Google Translate first. Most of this database. It will be pictures and descriptions related to daily life. and the second database is A custom-made traffic dataset containing 429 images and 5 Thai language captions. This database contains images and captions related to road traffic such as A girl was walking across the road. A red light warns all cars and motorcycles to stop. The reason for creating this data set is because this thesis hopes that in the future this research will be able to create a warning system for drivers on the road or even people traveling on the road, not just drivers. The only notification system is an audio notification when the model receives image input, but this thesis does not go into that system. Therefore, the experiment of this thesis will combine the two datasets because we want to not only see traffic-related results but also to see general image description results. Moreover, combining the datasets also enhances learning for the model as well And finally, the subtitles generated by the model were evaluated against the reference subtitles using the BLEU metric.	en
dc.description.abstract	วิทยานิพนธ์เล่มนี้ได้ออกแบบและพัฒนาโมเดลการเรียนรู้เชิงลึกเพื่อสร้างคำบรรยายภาพภาษาไทยโดยใช้ Convolutional Neural Network (CNN) อย่างเช่น VGG16 และอื่นๆ เพื่อคัดแยกคุณลักษณะของรูปภาพและได้ใช้ Bidirectional LSTM ในการสร้างคำบรรยายภาพ โดยที่ CNN คือกระบวนการในการเข้ารหัส และ Bidirectional LSTM คือกระบวนการในการถอดรหัส ซึ่ง Bidirectional LSTM คือ LSTM อีกประเภทที่ช่วยให้โมเดลสามารถเรียนรู้ได้แบบสองทิศทางคือ ทิศทางไปข้างหน้าและทิศทางย้อนกลับทำให้โมเดลเรียนรู้และแยกแยะคำที่มีความคล้ายคลึงกันได้รวมถึงเพิ่มความสามารถของหน่วยความจำโมเดล และในส่วนของชุดข้อมูลที่ใช้สำหรับการฝึกสอนและทดสอบประกอบด้วย ฐานข้อมูลแรกคือ Flickr8k ซึ่งเป็นฐานข้อมูลสาธารณะที่ภายในฐานข้อมูลประกอบไปด้วยรูปภาพจำนวน 8091 รูป และคำบรรยายภาษาอังกฤษ 5 คำบรรยายซึ่งจะทำการแปลคำบรรยายเป็นภาษาไทยโดยใช้ Google Translate ก่อน โดยส่วนใหญ่ฐานข้อมูลชุดนี้จะเป็นรูปภาพและคำบรรยายที่เกี่ยวกับชีวิตประจำวันทั่วไป และฐานข้อมูลที่สองคือ ชุดข้อมูลการจราจรที่จัดทำขึ้นเองซึ่งภายในจะประกอบไปด้วยรูปภาพ 429 รูป และคำบรรยายภาษาไทย 5 คำบรรยาย โดยฐานข้อมูลชุดนี้คือรูปภาพและคำบรรยายที่เกี่ยวข้องกับการสัญจรบนท้องถนนอย่างเช่น เด็กผู้หญิงคนหนึ่งกำลังเดินข้ามถนน ไฟแดงเตือนให้รถยนต์และรถจักรยานยนต์ทุกคันต้องหยุด ซึ่งเหตุผลที่ได้จัดทำชุดข้อมูลนี้เพราะว่าวิทยานิพนธ์เล่มนี้หวังว่างานวิจัยชุดนี้ในอนาคตจะสามารถทำการสร้างระบบแจ้งเตือนให้กับผู้ขับขี่บนท้องถนนหรือแม้แต่ผู้ที่สัญจรอยู่ตามท้องถนนไม่ใช่กับผู้ขับขี่อย่างเดียวซึ่งระบบการแจ้งเตือนนั้นจะเป็นการแจ้งเตือนด้วยเสียงเมื่อโมเดลรับอินพุตภาพเข้ามาแล้วแต่วิทยานิพนธ์ฉบับนี้ไม่ได้ทำไปจนถึงระบบนั้น ดังนั้นการทดลองของวิทยานิพนธ์ฉบับนี้จะทำการรวมชุดข้อมูลทั้งสองเข้าด้วยกันเพราะไม่เพียงแต่ต้องการดูผลลัพธ์ที่เกี่ยวข้องกับการจราจรแต่ต้องการดูผลลัพธ์การบรรยายรูปภาพทั่วไปด้วยอีกทั้งการรวมชุดข้อมูลเข้าด้วยกันยังช่วยเสริมการเรียนรู้ให้กับโมเดลด้วย และสุดท้ายได้ทำการประเมินคำบรรยายที่โมเดลสร้างเทียบกับคำบรรยายอ้างอิงโดยการใช้ตัวชี้วัด BLEU	th
dc.language.iso	th
dc.publisher	Silpakorn University
dc.rights	Silpakorn University
dc.subject	คำบรรยายภาพภาษาไทย	th
dc.subject	ชุดข้อมูลการจราจร	th
dc.subject	ชุดข้อมูล Flickr8k	th
dc.subject	โครงข่ายประสาทเทียมแบบ Convolutional	th
dc.subject	LSTM แบบสองทิศทาง	th
dc.subject	ตัวชี้วัด BLEU	th
dc.subject	Thai Captions	en
dc.subject	Traffic Dataset	en
dc.subject	Flickr8k Dataset	en
dc.subject	Convolutional Neural Networks(CNN)	en
dc.subject	Bidirectional LSTM	en
dc.subject	BLEU Metric	en
dc.subject.classification	Engineering	en
dc.subject.classification	Information and communication	en
dc.subject.classification	Electronics and automation	en
dc.title	Development of Thai Image Captioning Method Using Deep Learning	en
dc.title	การพัฒนาวิธีการสร้างคำบรรยายภาพภาษาไทยโดยใช้การเรียนรู้เชิงลึก	th
dc.type	Thesis	en
dc.type	วิทยานิพนธ์	th
dc.contributor.coadvisor	SOPON PHUMEECHANYA	en
dc.contributor.coadvisor	โสภณ ผู้มีจรรยา	th
dc.contributor.emailadvisor	phumeechanya_s@su.ac.th
dc.contributor.emailcoadvisor	phumeechanya_s@su.ac.th
dc.description.degreename	Master of Engineering (M.Eng.)	en
dc.description.degreename	วิศวกรรมศาสตรมหาบัณฑิต (วศ.ม)	th
dc.description.degreelevel	Master's Degree	en
dc.description.degreelevel	ปริญญาโท	th
dc.description.degreediscipline	ELECTRICAL ENGINEERING	en
dc.description.degreediscipline	วิศวกรรมไฟฟ้า	th
Appears in Collections:	Engineering and Industrial Technology

Files in This Item:

File	Description	Size	Format
640920027.pdf		14.74 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets