ISSN 2410-7751 (Print)
ISSN 2410-776X (Online)
"Biotechnologia Acta" V. 10, No 5, 2017
https://doi.org/10.15407/biotech10.05.005
Р. 5-18, Bibliography 82, English
Universal Decimal Classification: 004:591.5:612:616-006
CLUSTER ANALYSIS IN BIOTECHNOLOGY
Kavetsky Institute of Experimental Pathology, Oncology and Radiobiology of the National Academy of Sciences of Ukraine, Kyiv
Kavetsky Institute of Experimental Pathology, Oncology and Radiobiology of the National Academy of Sciences of Ukraine, Kyiv The goal of publication was the analysis of cluster methods and possibility of their application in biotechnology. The evidences found in scientific literature were summarized and analyzed. This article gives a brief description of cluster analysis — basic principles, some examples of their application are given for biotechnological problems. Results of the biotechnological studies that required application of cluster methods in combination with other mathematical approaches are considered. The conclusion contains an evaluation of the performed analysis as well as recommendations on the application of cluster analysis methods in biotechnology.
Key words: cluster analysis, biotechnology.
© Palladin Institute of Biochemistry of National Academy of Sciences of Ukraine, 2017
References
1. Klyuchko O. M. Information and computer technologies in biology and medicine. Kyiv:NAU-druk. 2008, 252 p. (In Ukrainian).
2. Merrell R., Diaz D. Comparison of data mining methods on different applications: clustering and classification methods. Inf. Sci. Lett. 2015, 4 (2), 61–66. http://dx.doi.org/10.12785/isl/040202.
3. Jecheva V., Nikolova E. Some clustering-based methodology applications to anomaly intrusion detection systems. Int. J. Secur. Appl. 2016, 10 (1), 215–228. http://dx.doi.org/10.14257/ijsia.2016.10.1.20.
4. Iakovidis D. K., Maroulis D. E., Karkanis S. A. Texture multichannel measurements for cancer precursors’ identification using support vector machines. Measurement. 2004, V. 36, P. 297–313. https://doi.org/10.1016/j.measurement. 2004.09.010
5. Nguyen H. Q., Carrieri-Kohlman V., Rankin S. H., Slaughter R, Stulbarg M. S. Internet-based patient education and support interventions: a review of evaluation studies and directions for future research. Comp. Biol. Med. 2004, 34 (2), 95–112. doi: 10.1016/S0010-4825(03)00046-5.
6. J?z?quel P., Loussouarn L., Gu?rin-Charbonnel C., Campion L., Vanier A., Gouraud W., Lasla H., Guette C., Valo I., Verri?le V., Campone M. Gene-expression molecular subtyping of triplenegative breast cancer tumours: importance of immune response. Breast Cancer Res. 2015, 17 (1), 43. https://doi.org/10.1186/s13058-015-0550-y.
7. Bozhenko V. K. Multivariable analysis of laboratory blood parameters for obtaining diagnostic information in experimental and clinical oncology. The dissertation author’s abstract on scientific degree editions. Dc. Med. Study. Moscow. 2004. (In Russian).
8. Ko J. H., Ko E. A., Gu W., Lim I., Bang H., Zhou T. Expression profiling of ion channel genes predicts clinical outcome in breast cancer. Mol. Cancer. 2013, 12 (1), 106. doi: 10.1186/1476-4598-12-106.
9. Kawai M., Nakashima A., Kamada S., Kikkawa U. Midostaurin preferentially attenuates proliferation of triple-negative breast cancer cell lines through inhibition of Aurora kinase family. J. Bbiomed. Sci. 2015, 22 (1), 48. doi: 10.1186/s12929-015-0150-2.
10. Uhr K., Wendy J. C., Prager-van der Smis sen, Anouk A. J. Heine, Bahar Ozturk, Mar cel Smid, Hinrich W. H. G?hlmann, Agnes Jager, John A. Foekens, John W. M. Martens. Understanding drugs in breast cancer through drug sensitivity screening. SpringerPlus. 2015, 4 (1), 611. doi: 10.1186/s40064-015-1406-8.
11. Onopchuk Yu. M., Biloshitsky P. V., Klyuchko O. M. Development of mathematical models based on the results of researches of Ukrainian scientists at Elbrus. Visnyk NAU. 2008, N 3, P. 146–155. (In Ukrainian).
12. Ankur Poudel, Dhruba Bahadur Thapa, Manoj Sapkota. Cluster Analysis of Wheat (Triticum aestivum L.) Genotypes Based Upon Response to Terminal Heat Stress. Int. J. Appl. Sci. Biotechnol. 2017, 5 (2), 188–193. http://dx.doi.org/10.3126/ijasbt.v5i2.17614.
13. Zaslavsky L., Ciufo S., Fedorov B., Tatusova T. Clustering analysis of proteins from microbial genomes at multiple levels of resolution. BMC Bioinform. 2016, 17 (8), 276. Published online 2016 Aug 31. doi:10.1186/s12859-016-1112-8.
14. Zhou J., Richardson A. J., Rudd K. E. Eco Gene-RefSeq: EcoGene tools applied to the RefSeq prokaryotic genomes. Bioinformatics. 2013, 29 (15), 1917–1918. Published: 04 June 2013. doi: 10.1093/bioinformatics/btt302.
15. Zhang J., Wu G., Hu X., Li S., Hao S. A Parallel Clustering Algorithm with MPI — MKmeans. J. Comput. 2013, 8 (1), 10–17. doi: 10.1109/PAAP. 2011.17.
16. Tatusova T., Zaslavsky L., Fedorov B., Haddad D., Vatsan A., Ako-adjei D., Blinkova O., Ghazal H. Protein Clusters. The NCBI Handbook [Internet]. 2nd edition. Available at https://www.ncbi.nlm.nih.gov/books/NBK242632.
17. Anderson J. G. Evaluation in health informatics: computer simulation. Computers in Biology and Medicine. 2002, 32 (3), 151–164. https://doi.org/10.1016/S0010-4825(02)00012-4.
18. Aruna P., Puviarasan N., Palaniappan B. An investigation of neuro-fuzzy systems in psychosomatic disorders. Exp. Syst. Appl. 2005, 28 (4), 673–679. https://doi.org/10.1016/j.eswa.2004.12.024.
19. Baert P., Meesen G., De Schynkel S., Poffijn A., Oostveldt P. V. Simultaneous in situ profiling of DNA lesion endpo ints based on image cytometry and a single cell database approach. Micron. 2005, 36 (4), 321–330. https://doi.org/ 10.1016/j.micron.2005.01.005.
20. Bange M. P., Deutscher S. A., Larsen D., Linsley D., Whiteside S. A handheld decision support system to facilitate impr oved insect pest management in Australian cotton systems. Comp. Electron. Agricult. 2004, 43 (2), 131–147. https://doi.org/10.1016/j.compag.2003.12.003.
21. Beaulieu A. From brainbank to database: the informational turn in the study of the brain. Stud. Hist. Phil. Biol. Biomed. Sci. 2004, V. 35, P. 367–390. https://doi.org/10.1016/j.shpsc.2004.03.011.
22. Bedathur S. J., Haritsa J. R., Sen U. S. The building of BODHI, a bio-diversity database system. Inform. Syst. 2003, 28 (4), 347–367. https://doi.org/ 10.1016/S0306-4379(02)00073-X.16
23. Berks G., Ghassemi A., von Keyserlingk D. G. Spatial registration of digital brain atlases based on fuzzy set theory. Comp. Med. Imag. Graph. 2001, 25 (1), 1–10. https://doi.org/10.1016/S0895-6111(00)00038-0.
24. Brake I. Unifying revisionary taxonomy: insect exemplar groups. Abstr. XV SEL Congr. Berlin (Germany). 2007.
25. Braxton S. M., Onstad D. W., Dockter D. E., Giordano R., Larsson R., Humber R. A. Description and analysis of two internet-based databases of insect pathogens: EDWIP and VIDIL. J. Invertebr. Pathol. 2003, 83 (3), 185–195. doi: 10.1016/S0022-2011(03)00089-2.
26. Breaux A., Cochrane S., Evens J., Martindaled M., Pavlike B., Suera L., Benner D. Wetland ecological and compliance assessments in the San Francisco Bay Region, California, USA. J. Environm. Manag. 2005, 74 (3), 217–237.
27. Budura A., hilippeCudr?-Mauroux P., Aberer K. From bioinformatic web portals to seman tically integrated Data Grid networks. Future Generation Computer Systems. 2007, 23 (3), 281?522. doi: 10.1016/j.jenvman.2004.08.017.
28. Burns G., Stephan K. E., Lud?scher B., Gupta A., K?tter R. Towards a federated neuroscientific knowledge management system using brain atlases. Neurocomputing. 2001, V. 38–40, P. 1633–1641. https://doi.org/10.1016/S0925-2312(01)00520-3.
29. Butenko S., Wilhelm W. E. Clique-detection models in computational biochemistry and genomics. Eur. J. Oper. Res. 2006, 173 (1), 1–17. https://doi.org/ 10.1016/j.ejor.2005.05.026.
30. Carro S. A., Scharcanski J. Framework for medical visual information exchange on the WEB. Comp. Biol. Med. 2006, 36 (4), 327–338. doi: 10.1016/ j.compbiomed.2004.10.004.
31. Chaplot S., Patnaik L. M., Jagannathan N. R. Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network. Biomed. Signal Process. Control. 2006, 1 (1), 86–92. https://doi.org/10.1016/j.bspc.2006.05.002.
32. Chakravarty M. M., Bertrand G., Hodge C. P., Sadikot A. F., Collins D. L. The creation of a brain atlas for image guided neurosurgery using serial histological data. NeuroImage. 2006, 30 (2), 359–376. doi: 10.1016/j.neuroimage.2005.09.041.
33. Chau M., Huang Z., Qin J., Zhou Y., Chen H. Building a scientific knowledge web portal: The NanoPort experience. Decision Support Systems. 2006. https://doi.org/10.1016/j.dss.2006.01.004.
34. Chen M., Hofest?dt R. A medical bioinformatics approach for metabolic disorders: Biomedical data prediction, modeling, and systematic analysis. J. Biomed. Inform. 2006, 39 (2), 147–159. https://doi.org/10.1016/j.jbi.2005.05.005.
35. Chli M., De Wilde P. Internet search: Subdivisionbased interactive query expansion and the soft semantic web Applied Soft Computing. 2006. https://doi.org/10.1016/j.asoc.2005.11.003.
36. Despont-Gros C., Mueller H., Lovis C. Evaluating user interactions with clinical information systems: A model based on human–computer interaction models. J. Biomed. Inform. 2005, 38 (3), 244–255. https://doi.org/10.1016/ j.jbi.2004.12.004.
37. Despont-Gros C., Mueller H., Lovis C. Evaluating user interactions with clinical information systems: a model based on human-computer interaction models. J. Biomed. Inform. 2005, 38 (3), 244–255. doi: 10.1016/j.jbi.2004.12.004.
38. Marios D., Dikaiakos M. D. Intermediary infrastructures for the World Wide Web. Comp. Networks. 2004, V . 45, P. 421–447. https://doi.org/ 10.1016/j.comnet.2004. 02.008.
39. Dikshit A., Wu D., Wu C., Zhao W. An online interactive simulation system for medical imaging education . Comp. Med. Imag. Graph. 2005, 29 (6), 395–404. https://doi.org/10.1016/j.compmedimag.2005.02.001.
40. Dimitrov S. D., Mekenyan O. G, Sinks G. D., Schultz T. W. Global modeling of narcotic chemicals: ciliate and fish toxicity. J. Mol. Struc.: Theochem. 2003, 622 (1–2), 63–70. https://doi.org/10.1016/S0166-1280(02)00618-8.
41. Dong Y., Zhuang Y., Chen K., Tai X. A hierarchical clustering algorithm based on fuzzy graph connectedness. Fuzzy Sets. Syst. 2006, V. 157, P. 1760–1774. https://doi.org/10.1016/j.fss.2006.01.001.
42. Duan Y., Edwards J. S., Xu M. X. Web-based expert systems: benefits and challenges. Inf. Manag. 2005, 42 (6), 799–811. https://doi.org/10.1016/j.im.2004. 08.005.
43. Essen van D. C. Windows on the brain: the emerging role of atlases and databases in neuroscience. Curr. Opin. Neurobiol. 2002, 12 (5), 574–579. https://doi.org/10.1016/S0959-4388(02)00361-6.
44. Fellbaum C., Hahn U., Smith B. Towards new information resources for public health — From Word Net to Medical Word Net. J. Biomed. Inform. 2006, 39 (3), 321-332. doi:10.1016/j.jbi.2005.09.004.
45. Ferraris M., Frixione P., Squarcia S. Network oriented radiological and medical archive. Comp. Physics Commun. 2001, V. 140, P. 226–232. https://doi.org/10.1016/S0010-4655(01)00273-9.
46. Flower D. R., Attwood T. K. Integrative bioinformatics for functional genome annotation: trawling for G protein-coupled receptors. Semin. Cell. Dev. Biol. 2004, 15 (6), 693–701. doi: 10.1016/j.semcdb.2004.09.008.
47. Fink E., Kokku P. K., Nikiforou S., Hall L. O., Goldgof D. B, Krischer J. P. Selection of patients for clinical trials: an interactive webbased system. Art. Intell. Med. 2004, 31 (3), 241–254. doi: 10.1016/j.artmed.2004.01.017.
48. Fitzpatrick M. J., Ben-Shahar Y., Smid H. M., Vet L. E., Robinson G. E., Sokolowski M. B. Candidate genes for behavioural ecology. Trend Ecol. Evol. 2005, 20 (2), 96–104. doi:10.1016/j.tree.2004.11.017.
49. Fox J., Alabassi A., Patkar V., Rose T., Black E. An ontological approach to modelling tasks and goals. Comp. Biol. Med. 2006, V. 36, P. 837–856. https://doi.org/10.1016/j.compbiomed.2005.04.011.
50. Fu Zetian, Xu Feng, Zhou Yun, Shuan X. Z. Pig-vet: a web-based expert system for pig disease diagnosis. 2006. https://doi.org/10.1016/j.eswa.2005.01.011.
51. Gaulton A., Attwood T. K. Bioinformatics approaches for the classification of G-proteincoupled receptors. Curr. Opin. Pharmacol. 2003, 3 (2), 114–120. doi: 10.1016/S1471-4892(03)00005-5.
52. Gevrey M., Worner S., Kasabov N., Pitt J., Giraudel J. L. Estimating risk of events using SOM models: A case study on invasive species establishment. Ecol. Modell. 2006, 197 (3–4), 361–372. https://doi.org/10.1016/j.ecolmodel. 2006.03.032.
53. Glenisson P., Gl?nzel W., Janssens F., Moor B. D. Combining full text and bibliometric information in mapping scientific disciplines. Inf. Proc. Manag. 2005, 41 (6), 1548–1572. https://doi.org/10.1016/j.ipm.2005.03.021.
54. Gomez-Perez A., Fernandez-Lopez M., Corcho O. Ontological engineering. London: Springer-Verlag. 2004. https://doi.org/10.1007/b97353.
55. Graham C. H., Ferrier S., Huettman F., Moritz C., Peterson A. T. New developments in museum-based informatics and applications in biodiversity analysis. Trend. Ecol. Evol. 2004, 19 (9), 497–503. https://doi.org/10.1016/ j.tree.2004.07.006.
56. Gruber T. R. A translation approach to portable ontologies. Knowledge Acquisition. 1993, 5 (2), 199–220. doi: 10.1006/knac.1993.1008.
57. Hauser C., Holstein J., Steiner A. Butterfly taxonomy for the Internet: opportunities and challenges for the GART/GloBIS database project. Abst. XIV SEL Congress. Roma (Italy). 2005, Р. 21.
58. Hirano S., Sun X., Tsumoto S. Comparison of clustering methods for clinical databases. Inform. Sci. 2004, 159 (3–4), P. 155–165. https://doi.org/ 10.1016/j.ins.2003.03.011.
59. Hong Yu., Hatzivassiloglou V., Rzhetsky A., Wilbur W. J. Automatically identifying gene/protein terms in MEDLINE abstracts. J. Biomed. Inform. 2002, 35 (5–6), 322–330. https://doi.org/10.1016/S1532-0464(03)00032-7.
60. Horn W. AI in medicine on its way from knowledge-intensive to data-intensive systems. Artificial Intelligence in Medicine. Elsevier. 2001, 23 (1), 5–12. https://doi.org/10.1016/S0933-3657(01)00072-0.
61. Hsi-Chieh Lee, Szu-Wei Huang, Li E. Y. Mining protein–protein interaction information on the internet. Exp. Syst. Appl. Elsevier. 2006, 30 (1), 142–148. https://doi.org/10.1016/j.eswa.2005.09.083.
62. Jabs R., Pivneva T., Huttmann K., Wyczynski A., Nolte C., Kettenmann H., Steinh?user C. Synaptic transmission onto hyppocampal glial cells with hGFAP promoter activity. J. Cell Sci. 2005, V. 118, P. 3791–3803. doi: 10.1242/jcs.02515.
63. Johnson S. B., Friedman R. Bridging the gap between biological and clinical informatics in a graduate training program. J. Biomed. Inform. 2007, 40 (1), 59–66. Epub. 2006 Mar 15. doi: 10.1016/j.jbi.2006.02.011.
64. Kai ser M., Hilgetag C. C. Modelling the de velopment of cortical systems networks. Neurocomputing. 2004, V. 58–60, P. 297–302. https://doi.org/ 10.1016/j.neucom.2004.01.059.
65. Kane M. D., Brewer J. L. An information technology emphasis in biomedical informa tics education. J. Biomed. Inform. 2007, 40 (1), 67–72. https://doi.org/10.1016/j.jbi.2006.02.006.
66. Kannathal N., Acharya U. R., Lim C. M., Sadasivan P. K. Characterization of EEG —A comparative study. Comp. Meth. Progr. Biomed. 2005, 80 (1), 17–23. https://doi.org/10.1016/j.cmpb.2005.06.005.
67. Kitching I. J. Taxonomy in the 21 Century: the CATE model for web revisions. Abst. Of XV SEL Congress. Berlin (Germany). 2007.
68. Koh W., McCormick B. H. Brain microstructure database system: an exoskeleton to 3D reconstruction and modelling. Neurocomputing. 2002, V. 44–46, P. 1099–1105. https://doi.org/10.1016/S0925-2312(02)00426-5.
69. Koh W., McCormick B. H. Registration of a 3D mouse brain atlas with brain microstructure data. Neurocomputing. 2003, V. 52–54, P. 307–312. https://doi.org/10.1016/S0925-2312(02)00793-2.
70. Kovalev V. A., Petrou M., Suckling J. Detection of structural differences between the brains of schizophrenic patients and controls. Psychiatry Research: Neuroimaging. 2003, 124 (3), 177–189. https://doi.org/10.1016/S0925-4927(03) 00070-2.
71. Kulish V., Sourin A., Sourina O. Human electro encephalograms seen as fractal time series: Mathematical analysis and visualization. Comp. Biol. Med. 2006, 36 (3), 291–302. doi:10.1016/j.compbiomed.2004.12.003.
72. Li Q., Wu Y. Identifying important concepts from medical documents. J. Biom. Inform. 2006, 39 (6), 668–679. doi: 10.1016/j.jbi.2006.02.001.
73. Lubitz von D., Wickramasinghe N. Networkcentric healthcare and bioinformatics: Unified operations within three domains of knowledge. Exp. Syst. Appl. 2006, 30 (1), 11–23. https://doi.org/10.1016/j.eswa.2005.09.069.
74. Ma Y., Hof P. R., Grant S. C., Blackband S. J., Bennett R., Slatest L., McGuigan M. D., Benveniste H. A three-dimensional digital atlas database of the adult C57BL/6J mouse brain by magnetic resonance microscopy. Neuroscience. 2005, 135 (4), 1203–1215. doi:10.1016/j.neuroscience.2005.07.014.
75. Mahaman B. D., Harizanis P., Filis I., Antonopoulou E., Yialouris C. P., Sideridis A. B. A diagnostic expert system for honeybee pests. Comp. Electr. Agricult. 2002, 36 (1), 17–31. https://doi.org/10.1016/S0168-1699(02)00069-8.
76. Martin-Sanchez F., Iakovidis I., N?rager S., Maojo V., de Groen P., Van der Lei J., Jones T., Abraham-Fuchs K., Apweiler R., Babic A., Baud R., Breton V., Cinquin P., Doupi P., Dugas M., Eils R., Engelbrecht R., Ghazal P., Jehenson P., Kulikowski C., Lampe K., De Moor G., Orphanoudakis S., Rossing N., Sarachan B., Sousa A., Spekowius G., Thireos G., Zahlmann G., Zv?rov? J., Hermosilla I., Vicente F. J. Synergy between medical informatics and bioinformatics: facilitating genomic medicine for future health care. J. Biomed. Inform. 2004, 37 (1), 30–42. doi:10.1016/j.jbi.2003.09.003.
77. Masseroli M., Visconti A., Bano S. G. Pinciroli F. He@lthCo-op: a web-based system to support distributed healthcare co-operative work. Comp. Biol. Med. 2006, 36 (2), 109–127. doi:10.1016/j.compbiomed.2004.09.005.
78. Moon S., Byun Y., Han K. FSDB: A frameshift signal database. Comp. Biol. Chem. 2007, 31 (4), 298–302. doi: 10.1016/j.compbiolchem.2007.05.004.
79. Nowinski W. L., Belov D. The Cerefy Neuroradiology Atlas: a Talairach–Tournoux atlas-based tool for analysis of neuroimages available over the internet. NeuroImage. 2003, 20 (1), 50–57. https://doi.org/10.1016/S1053-8119(03)00252-0.
80. Orgun B., Vu J. HL7 ontology and mobile agents for interoperability in heterogeneous medical information systems. Comp. Biol. Med. 2006, 36 (7–8), 817–836. https://doi.org/10.1016/j.compbiomed.2005.04.010.
81. P?rez-Rey D., Maojo V., Garc?a-Remesal M., Alonso-Calvo R., Billhardt H., Martin-S?nchez F., Sousa A. Ontofusion: Ontology-based integration of genomic and clinical databases. Comp. Biol. Med. 2006, 36 (7–8), 712–730. doi: 10.1016/j.compbiomed.2005.02.004.
82. Rana B. K., Insel P. A. G-protein-coupled receptor websites. Trend. Pharmacol. Sci. 2002, 23 (11), 535–536. doi: http://dx.doi.org/10.1016/S0165-6147(02)02113-2.