Development of an evolutionary clustering method for gene-set enrichment analysis

Please use this identifier to cite or link to this item: http://ithesis-ir.su.ac.th/dspace/handle/123456789/4960

Title:	Development of an evolutionary clustering method for gene-set enrichment analysis การพัฒนาวิธีการจัดกลุ่มเชิงวิวัฒนาการสำหรับการวิเคราะห์ความสำคัญของกลุ่มยีนส์
Authors:	Pacharaon SANPRASERT พชรอร แสนประเสริฐ Yutana Jewajinda ยุทธนา เจวจินดา Silpakorn University Yutana Jewajinda ยุทธนา เจวจินดา JEWAJINDA_Y@SU.AC.TH JEWAJINDA_Y@SU.AC.TH
Issue Date:	28
Publisher:	Silpakorn University
Abstract:	This thesis proposes a data clustering approach using a particle swarm optimization algorithm to analyze gene pathway importance in microarray analysis. The aim is to present a new clustering tool that provides a diverse set of solutions and avoids solutions that are trapped in local maximum or minimum. The tool also ensures that the output has the characteristics of a suitable data cluster when compared to traditional hierarchical and k-means clustering. The proposed clustering approach is developed using an evolutionary algorithm and implemented in Rstudio program. The results are compared with those obtained from pathfindR and stats, which are popular gene analysis and clustering tools for researchers. The study found that the proposed clustering approach with particle swarm optimization (PSO) algorithm provides a diverse set of clusters for k=19, 25, and 30, resulting in 8 to 27 clusters with the highest ℒ2 objective function value of 73.5088. In comparison, the hierarchical and k-means clustering approaches yielded the highest ℒ2 objective function values of 66.1339 and 57.4773, respectively. The properties of data grouping in terms of compactness and separability provided alternative clustering solutions. Moreover, from 431 tests conducted, the PSO clustering algorithm gave the highest maximum number of different answers, which was 38, while the hierarchical and k-means clustering methods gave 24 and 1 different answers, respectively, when the data were grouped into 14 clusters. In summary, this research concludes that the proposed clustering algorithm provided outstanding results in terms of diversity of clustering solutions and the highest ℒ2 objective function value. However, it may not perform well in terms of inter-group distances, which requires further development of clustering methods to enhance the clustering properties of data in terms of compactness and separation of data groups. วิทยานิพนธ์ฉบับนี้เสนอการสร้างตัวจัดกลุ่มข้อมูลด้วยขั้นตอนวิธีการหาค่าเหมาะสมที่สุดแบบกลุ่มอนุภาคเพื่อการวิเคราะห์ Pathway ในงานการวิเคราะห์ความสำคัญของกลุ่มยีนส์จากเทคนิคไมโครอะเรย์ โดยมุ่งหวังที่จะนำเสนอเครื่องมือตัวจัดกลุ่มรูปแบบใหม่ที่ให้ความหลากหลายของคำตอบ และเลี่ยงภาวะคำตอบเข้าสู่สภาวะสูงสุดหรือต่ำสุดท้องถิ่น รวมถึงเป็นเครื่องมือที่ทำให้คำตอบมีคุณลักษณะของการเป็นกลุ่มข้อมูลที่เหมาะสมเปรียบเทียบกับการจัดกลุ่มข้อมูลรูปแบบดั้งเดิม ได้แก่ การจัดกลุ่มข้อมูลประเภทลำดับขั้น (Hierarchical clustering) และการจัดกลุ่มข้อมูลแบบเคมีน (K-means clustering) วิธีการสร้างและแนวคิดการพัฒนาตัวจัดกลุ่มใช้ขั้นตอนวิธีการค้นหาแบบกลุ่มอนุภาคนี้เป็นขั้นตอนวิธีเชิงวิวัฒนาการโดยสร้างเครื่องมือจัดกลุ่มนี้ด้วยภาษาอาร์ใช้ในโปรแกรม Rstudio และเปรียบเทียบผลของการจัดกลุ่มเทียบกับการจัดกลุ่มรูปแบบอื่นจากเครื่องมือ pathfindR และเครื่องมือ stats ซึ่งเป็นเครื่องมือวิเคราะห์ข้อมูลยีนส์และการจัดกลุ่มทั่วไปสำหรับผู้วิจัยในวงกว้าง ผลการวิจัยพบว่าตัวจัดกลุ่มด้วยขั้นตอนวิธีการหาค่าความเหมาะสมแบบกลุ่มอนุภาคที่นำเสนอสามารถจัดกลุ่มข้อมูลได้จากการกำหนดค่า k เริ่มต้น อาทิ 19 25 และ 30 กลุ่ม สามารถแบ่งกลุ่มออกมา (kpso) ได้หลายรูปแบบตั้งแต่ 8 - 27 กลุ่ม ให้ค่าสมการจุดประสงค์ ℒ2 มากที่สุดสูงสุดเมื่อเทียบการจัดกลุ่มรูปแบบอื่น ๆ ได้แก่ 73.5088 ในขณะที่การจัดกลุ่มแบบลำดับขั้นและเคมีน ให้ค่าสมการจุดประสงค์ ℒ2 มากที่สุดได้แก่ 66.1339 และ 57.4773 ตามลำดับ คุณสมบัติของการเป็นกลุ่มของข้อมูลในแง่ความกะทัดรัดและการแยกกันของกลุ่มข้อมูลด้วยวิธีการที่นำเสนอยังให้คำตอบเป็นรองต่อการจัดกลุ่มอีก 2 รูปแบบ นอกจากนี้ยังพบจำนวนคำตอบที่แตกต่างกันจากการทดสอบ 431 ครั้ง พบว่าตัวจัดกลุ่มที่นำเสนอให้จำนวนคำตอบที่แตกต่างกันสูงสุดถึง 38 คำตอบ ในขณะที่การจัดกลุ่มแบบเคมีนและลำดับขั้นให้ 24 และ 1 คำตอบ ตามลำดับ เมื่อจัดกลุ่มข้อมูลได้ 14 กลุ่ม โดยสรุปผลของงานวิจัยนี้พบว่า ตัวจัดกลุ่มที่นำเสนอนี้ให้ผลโดดเด่นด้านความหลากหลายของรูปแบบคำตอบ และยังทำให้ค่าสมการจุดประสงค์สูงสุดมีค่ามากที่สุดอีกด้วย ทั้งนี้ตัวจัดกลุ่มที่นำเสนออาจไม่ให้ผลดีอย่างเห็นได้ชัดในด้านระยะห่างระหว่างกลุ่ม ซึ่งต้องมีการพัฒนาขั้นตอนวิธีการสร้างตัวจัดกลุ่มต่อไปเพื่อทำให้คุณสมบัติการเป็นกลุ่มข้อมูลทั้งในแง่ความกะทัดรัดของข้อมูลและการแยกกันของข้อมูลมีความเด่นชัดขึ้น
URI:	http://ithesis-ir.su.ac.th/dspace/handle/123456789/4960
Appears in Collections:	Engineering and Industrial Technology

Files in This Item:

File	Description	Size	Format
620920062.pdf		4.81 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets