The experimental results showed that there is a difference in the number of concepts extracted from the different test data. This can be attributed to the fact that the test data had different contents, from different courses. In addition, the test data were of different sizes. Another factor that contributes to the difference in the number of extracted concepts, varying between the different test data, is the writing styles used in the test data. Each test data had a different writing style used in delivering the contents.
6.5. DISCUSSION 49 It was observed that, comparably, when a higher threshold of ≥15 was used, more sensible concepts produced. This means that a larger number of sensible concepts lied above frequencies of ≥15. It was observed that a slightly lower threshold,>10 produced sensible compound concepts. We could explain these observations with the phrase: ”the more a concept appears in the text, the more relevant it is”. A higher threshold produced more sensible concepts. Based on the heuristics used, if concepts are significant in the given text material, the concepts have a tendency to occur frequently in the text. Other concepts in the text could occur, but less frequently. This could mean that the concepts could either be less significant or they are less inclusive concepts. More inclusive concepts occur frequently in a given text as they encompass general ideas in the text, therefore, mentioned frequently.
On the other hand, less inclusive concepts cover specific areas in a text, hence, occur less frequently in the text.
The results showed that a large number of extracted relations had frequencies of
≤ 3. A small number of extracted relations, which had frequencies of ≥ 3, were seen as sensible relations. As the extraction of potential relations between concepts relied on the extracted concepts, non-sensible concepts produced non-sensible rela-tions. As the direction of the relationship is not indicated by the ACMC, relations between concepts e.g. ’automaton’ and ’finite automaton’, and ’finite automaton’
and ’automaton’ are identified as one relationship.
It is important to note that the basic method used by the ACMC to extract concepts and relations was based on term occurrence. No syntactic analysis or auxiliary ontologies were used. The ACMC was not able to identify features such as equations, tables, formulas and algorithms. Due to this, the ACMC was not able to differentiate text in these features from the normal text in the text material. This could explain why the ACMC produced some non-sensible concepts and relations.
Here, we refer to concepts and relations that are not relevant to the text material as non-sensible. For instance, parts of equations, misspelt words, non-English words (as some texts contained Finnish words) and verbs were considered as non-sensible
concepts.
In the implementation of ACMC, we defined compound concepts as two consecutive words. Therefore, ACMC is restricted to finding compound concepts that are only two words long. This results in extracting incomplete and therefore non-sensible compound concepts. For example, ACMC produced ’nondeterministic finite’ and
’finite automaton’ as a compound concept which should be in fact ’non-deterministic finite automaton’. Extraction of non-sensible concepts in turn resulted in the ex-traction of non-sensible relations, for example, a relation between ’nondeterministic finite’ and ’automaton’. In cases whereby, for instance, ’nondeterministic finite’ and
’finite automaton’ were extracted as having a relations, transitivity could be used as a heuristic to identify longer concepts, in this case ’nondeterministic finite automa-ton’. Other heuristics could be integrated into the ACMC. For instance, checking for concepts in headings; chapter and section headings, topic and introductory sen-tences of a paragraph and emphasized words. These considerations could produce better results.
Additional heuristics that could be used in finding relations that are more sensible could include checking for potential relations in topic sentences; as such, sentences briefly introduce the concepts to be discussed. The ACMC checks for relations between concepts in a single sentence. Potential relations could be checked for in whole paragraphs as well. Significance and weight of relations could be assigned such that, if relations appear between concepts in a sentence and within a paragraph, then this weighs more than, if the relations appeared between the concepts in a paragraph. The direction and names of the relations are not indicated by the ACMC. This functionality could be implemented and added into the ACMC as a future development.
There was an observable difference in the number of concepts extracted by the ACMC and the number of concepts counted from the manually constructed maps.
This difference can be attributed to the style used in constructing the concept maps.
While the ACMC used frequency of occurrence in identifying and extracting
con-6.5. DISCUSSION 51 cepts from the text, the manual constructed concept maps did not have a particular method used in its extraction of concepts and relations. In constructing the human constructed concept maps, we read and understood the text, therefore coming up with concepts and relations between the concepts in the test data provided. The sensible concepts extracted by the ACMC included general concepts only, as they occurred frequently, as compared to the less inclusive concepts. On the other hand, human constructed concepts maps included both general and less inclusive concepts, differentiated by its position in the map (we used a treelike/hierarchical map like structure, with the more general concepts at the top and less inclusive concepts at the bottom). A different approach from the one used by the ACMC was used to establish relations between concepts in the manually constructed concept maps.
We relied on our understanding of the text, to discover relations between concepts.
Some of the relations between the concepts were not found in the same sentence as the ACMC does. The ACMC extracted relations only if concepts appeared in the same sentence. From the results, a large number of the relations extracted by the ACMC was deemed non-sensible. This accounts for the difference in the results produced by the ACMC and the manual concept maps.
Chapter 7 Conclusions
In general, it is difficult for an individual to construct an effective concept map as the concept maps vary from one individual to another. The individual’s understanding and perspective of the subject concerned accounts for the variations of concept maps of the same subject. Automatic or semi-automatic construction of concept maps reduce this ”biasness” that might be caused by the individual in an attempt to create a ”good” concept map.
In this research, we have discussed several semi-automatic and fully automatics approaches for constructing concept maps. Semi-automatic construction of concept maps requires some assistance from the user to complete the concept map. This involves the tool retrieving information for the concept maps, suggesting concepts or relations for the user. In automatic construction of concept maps, the whole map is constructed automatically, with no assistance from the user.
In this thesis, we introduced a method for constructing concept maps automatically from text. The method selects concepts based on the frequency of terms in the text and their relations by the frequency of co-occurrence in the same sentence. Our experiments suggest that the method can select well relevant concepts and especially compound concepts, but the extraction of relevant relations would require further research.
53
Appendix A
DM book concept map produced by Leximancer
55
Figure A.1: Concept map for the DM book created by Leximancer.
57
Figure A.2: List of concepts from DM book extracted by Leximancer.
Appendix B
TFCS concept map produced by Leximancer
58
59
Figure B.1: Concept map for the TFCS created by Leximancer.
Figure B.2: List of concepts from TFCS extracted by Leximancer.
Appendix C
Scientific Writing concept map produced by Leximancer
61
Figure C.1: Concept map created for Scientific Writing material created by Leximancer.
63
Figure C.2: List of concepts from Scientific Writing material extracted by Leximancer.
Appendix D Stop words list
the of and to a in that is
he for it with as his on be
at by I this not are but from
have an they which one you were her
all she there would their we him been
has when who will more no if out
so said what up its about into than
them can only other new some time could
these two may then do first any my
now such like our over man me even
most made after also did many before must through back years where much your way well
down etc might how or x s r
p g e c b section z u
t o n m l k f d
h j q v w y too
There were a few additions to the top 100 stop words fromhttp://www.edict.com.
hk/TextAnalyser/wordlists.htm. Such additions included letters of the alphabet and a few words, for instance ”section”. These additions were made after making
64
65 several tests using the test material given. It was observed that the results contained single letters of the alphabets as concepts as they were used frequently in the text.
The text also contained words such as ”section”, ”chapter” and ”example”. These are among the latex commands used in the text.
67
Appendix F
Hand-made concept map from TFCS
68
69
Figure F.1: Hand-made concept map constructed from TFCS test data.
Appendix G
Hand-made concept map from Scientific writing material
Note: The figure shows part of the hand constructed concept map from the Scientific Writing material.
70
71
Figure G.1: Hand-made concept map constructed from Scientific writing test data.
Appendix H
Hand-made concept map from Data mining material
Note: The figure shows part of the hand constructed concept map from the Data mining material.
72
73
,
i
\ I
Figure H.1: Hand-made concept map constructed from Data mining test data.
Bibliography
[AKM+03] Harith Alani, Sanghee Kim, David E. Millard, Mark J. Weal, Wendy Hall, Paul H. Lewis, and Nigel R. Shadbolt. Automatic ontology-based knowledge extraction from web documents. IEEE Intelligent Systems, 18(1):14–21, 2003.
[APC01] A. O Alves, F. C. Periera, and A. Cardoso. Automatic reading and learning from text. In ISAI’2001: In Proceedings of the International Symposium on Artificial Intelligence, pages 302–310, December 2001.
[BB00] E. Bruillard and G.L. Baron. Computer-based concept mapping: a review of a cognitive tool for students. In ICEUT ’00: Proceedings of Conference on Educational Uses of Information and Communication Technologies, pages 331–338. Publishing House of Electronics Industry (PHEI), 2000.
[BCW02] Christopher Brewster, Fabio Ciravegna, and Yorick Wilks. User-centred ontology learning for knowledge management. In NLDB ’02:
Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers, pages 203–
207, London, UK, 2002. Springer-Verlag.
[BR] R. Byrd and Y. Ravin. Identifying and extracting relations in text.
[CCA+02] Alberto J Ca˜nas, Marco Carvalho, Marco Arguedas, DB Leake, A Ma-guitman, and T Reichherzer. Mining the web to suggest concepts
dur-74
BIBLIOGRAPHY 75 ing concept mapping: Preliminary results. XIII Simp´osio Brasileiro de Inform´atica na Educa cao, 2002.
[CCH+03] J.W. Coffey, A.J. Canas, G. Hill, R. Carff, T. Reichherzer, and N. Suri.
Knowledge modeling and the creation of el-tech: a performance sup-port and training system for electronic technicians. Expert Systems with Applications, 25:483–492(10), November 2003.
[CGR91] Gail A. Carpenter, Stephen Grossberg, and David B. Rosen. Fuzzy art: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural Netw., 4(6):759–771, 1991.
[CLR86] M. Callon, J. Law, and A. Rip, editors. Mapping the dynamics of sci-ence and technology: Sociology of scisci-ence in the real world. Macmillan, London, 1986.
[Coo] James W. Cooper. Visualization of relational text information for biomedical knowledge discovery.
[CSRL01] Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E.
Leiserson.Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001.
[DDF+00] Scott C. Deerwester, Susan T. Dumais, George W. Furnas, Thomas K.
Landauerr, and Richard A. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391– 407, 2000.
[dRdCJF04] F.E.L. da Rocha, J.V. da Costa Jr, and E.L. Favero. A new approach to meaningful learning assessment using concept maps: ontologies and genetic algorithms. InCMC ’04: Proceedings of the First International Conference on Concept Mapping, 2004.
[DSB] David L. Darmofal, Diane H. Soderholm, and Doris R. Brodeur. Us-ing concept maps and concept questions to enhance conceptual under-standing.
[Fel98] Christiane Fellbaum, editor.WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, May 1998.
[FMG05] Blaz Fortuna, Dunja Mladenic, and Marko Grobelnik. Semi-automatic construction of topic ontology. In Conference on Data Mining and Data Warehouses (SiKDD 2005), October 2005.
[FYI02] Hideo Funaoi, Etsuji Yamaguchi, and Shigenori Inagaki. Collabora-tive concept mapping software to reconstruct learning processes. In ICCE ’02: Proceedings of the International Conference on Computers in Education, page 306, Washington, DC, USA, 2002. IEEE Computer Society.
[GS94] Brian R. Gaines and Mildred L. G. Shaw. Using knowledge acquisition and representation tools to support scientific communities. In AAAI
’94: Proceedings of the twelfth national conference on Artificial intelli-gence (vol. 1), pages 707–714, Menlo Park, CA, USA, 1994. American Association for Artificial Intelligence.
[HBN96] H. E. Herl, E. L. Baker, and D. Niemi. Construct validation of an approach to modeling cognitive structure of u.s. history knowledge. In Journal of Educational Research, volume 89, pages 206–218, 1996.
[HCCN02] Robert R. Hoffman, John W. Coffey, Mary Jo Carnot, and Joseph D.
Novak. An empirical comparison of methods for eliciting and mod-eling expert knowledge. In Proceedings of the Human Factors and Ergonomics Society 46th Annual Meeting. Human Factors and Er-gonomics Society, 2002.
BIBLIOGRAPHY 77 [Jak03] Maria Jakovljevic. Concept mapping and appropriate instructional strategies in promoting programming skills of holistic learners. In SAICSIT ’03: Proceedings of the 2003 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology, pages 308–315, Re-public of South Africa, 2003. South African Institute for Computer Scientists and Information Technologists.
[JBS00] J.A. Johnsen, D.E. Biegel, and R Shafran. Concept mapping in men-tal health: uses and adaptations. Evaluation and Program Planning, 23:67–75(9)0, February 2000.
[JMF99] A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review.
ACM Comput. Surv., 31(3):264–323, 1999.
[Joa99] Thorsten Joachims. Making large-scale SVM learning practical. In B. Schlkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1999.
[Kol93] Janet Kolodner. Case-based reasoning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993.
[Lea96] David B. Leake. Case-Based Reasoning: Experiences, Lessons and Future Directions. MIT Press, Cambridge, MA, USA, 1996.
[Liu03] H Liu. MontyTagger. Cambridge, Mass, 1.2 edition, 2003.
[LMR+03] David B. Leake, Ana Maguitman, Thomas Reichherzer, Alberto J.
Ca˜nas, Marco Carvalho, Marco Arguedas, Sofia Brenes, and Tom Es-kridge. Aiding knowledge capture by searching for extensions of knowl-edge models. In K-CAP ’03: Proceedings of the 2nd international conference on Knowledge capture, pages 44–53, New York, NY, USA, 2003. ACM Press.
[MBF+90] George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. Introduction to wordnet: An on-line lexical database. Int J Lexicography, 3(4):235–244, January 1990.
[MF90] S. Muggleton and C. Feng. Efficient induction of logic programs. In Proceedings of the 1st Conference on Algorithmic Learning Theory, pages 368–381. Ohmsma, Tokyo, Japan, 1990.
[MMJ94] K. M. Markham, J. J. Mintzes, and M. G. Jones. The concept map as a research and evaluation tool: Further evidence of validity. Journal of Research in Science Teaching, 31:91–101, 1994.
[MS01] Alexander Maedche and Steffen Staab. Learning ontologies for the semantic web. In Semantic Web 2001 (at WWW10), May 1, 2001, Hongkong, China, 2001.
[MS04] Alexander Maedche and Steffen Staab. Ontology learning. In Steffen Staab and Rudi Studer, editors,Handbook on Ontologies, International Handbooks on Information Systems, pages 173–190. Springer, 2004.
[MSS99] J.R. McClure, B. Sonak, and H.K. Suen. Concept map assessment of classroom learning: Reliability, validity, and logical practicality. Jour-nal of Research in Science Teaching, page 475492, 1999.
[NC06] Joseph D. Novak and Alberto J. Caas. The theory underlying concept maps and how to construct them. Technical report, Florida Institute for Human and Machine Cognition, 2006.
[NG84] Joesph D. Novak and Bob Gowin. Learning how to learn. Cambridge University Press, Cambridge, 1984.
[nHC+04] A. J. Ca nas, G. Hill, R. Carff, N. Suri, J. Lott, and T. Eskridge.
Cmaptools: A knowledge modeling and sharing environment. In A. J.
BIBLIOGRAPHY 79 Canas, J. D. Novak, and F. M. Gonzalez, editors, Concept maps:
Theory, methodology, technology. Proceedings of the first international conference on concept mapping, volume I, pages 125–133. Universidad Publica de Navarra, Pamplona, Spain, 2004.
[NVG03] Roberto Navigli, Paola Velardi, and Aldo Gangemi. Ontology learn-ing and its application to automated terminology translation. IEEE Intelligent Systems, 18(1):22–31, 2003.
[PC00] Francisco C. Pereira and Am´ılcar Cardoso. Clouds: A module for au-tomatic learning of concept maps. In Michael Anderson, Peter Cheng, and Volker Haarslev, editors, Diagrams, volume 1889 of Lecture Notes in Computer Science, pages 468–470. Springer, 2000.
[POC00] Francisco Cˆamara Pereira, A Oliveira, and Am´ılcar Cardoso. Extract-ing concept maps with clouds. In Proceedings of the Argentine Sym-posium of Artificial Intelligence (ASAI), 2000.
[RF05] Ryan Richardson and Edward A. Fox. Using concept maps in dig-ital libraries as a cross-language resource discovery tool. In JCDL
’05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pages 256–257, New York, NY, USA, 2005. ACM Press.
[RFP06] A.B. Rendas, M. Fonseca, and P.R. Pinto. Toward meaningful learn-ing in undergraduate medical education uslearn-ing concept maps in a pbl pathophysiology course. Adv Physiol Educ, 30(1):23–9, 2006.
[RGFP06] Ryan Richardson, Ben Goertzel, Edward A Fox, and Hugo Pinto. Au-tomatic creation and translation of concept maps for computer science-related theses and dissertations. In Proceedings of 2nd Concept Map-ping Conference, pages 160–163. Citeseer, 2006.
[RRS98] Diana C. Rice, Joseph M. Ryan, and Sara M. Samson. Using concept maps to assess student learning in the science classroom: Must dif-ferent methods compete? Journal of Research in Science Teaching, 35(10):1103–1127, 1998.
[RT02] Kanagasabai Rajaraman and Ah-Hwee Tan. Knowledge discovery from texts: a concept frame graph approach. In CIKM ’02: Proceedings of the eleventh international conference on Information and knowledge management, pages 669–671, New York, NY, USA, 2002. ACM Press.
[RW33] Y. Ravin and N. Wacholder. Extracting names from natural-language text, 2033.
[Sai01] H. Saito. A semi-automatic construction method of concept map based on dialog contents, 2001.
[Sal91] Gerard Salton. Developments in automatic text retrieval. Science, 253:974–979, 1991.
[SH05] Andrew E. Smith and Michael S. Humphreys. Evaluation of unsuper-vised semantic mapping of natural language with leximancer concept mapping. Behaviour Research Method, 38:262–279, 2005.
[SKUP+04] J.E. Sims-Knight, R.L. Upchurch, N. Pendergrass, T. Meressi, P. Fortier, P. Tchimev, R. VonderHeide, and M. Page. Using con-cept maps to assess design process knowledge. InFIE 2004: Frontiers in Education, volume F1G-6-10 Vol 2, October 2004.
[Smi05] Andrew E. Smith. Leximancer manual (Version 2.2). University of Queensland, 2005.
[SRF03] Rao Shen, Ryan Richardson, and Edward A. Fox. Concept maps as visual interfaces to digital libraries: summarization, collaboration, and automatic generation, may 2003.
BIBLIOGRAPHY 81 [ST93] D. D. Sleator and D. Temperley. Parsing english with a link grammar.
In Third International Workshop on Parsing Technologies, 1993.
[SWST04] Pei-Chi Sue, Jui-Feng Weng, Jun-Ming Su, and Shian-Shyong Tseng.
A new approach for constructing the concept map.icalt, 0:76–80, 2004.
[SZL14] Chen Sun, Ming Zhao, and Yangjing Long. Learning concepts and taxonomic relations by metric learning for regression.Communications in Statistics - Theory and Methods, 43(14):2938–2950, 2014.
[Tex] Word frequency list. http://www.edict.com.hk/TextAnalyser/
wordlists.htm. Virtual Language Center, University of Victoria, Hong Kong. Accessed: 2006-09-30.
[TTL01] Chang-Jiun Tsai, Shian-Shyong Tseng, and Chih-Yang Lin. A two-phase fuzzy mining and learning algorithm for adaptive learning en-vironment. In ICCS ’01: Proceedings of the International Conference on Computational Science-Part II, pages 429–438, London, UK, 2001.
Springer-Verlag.
[WH02] Shih-Hung Wu and Wen-Lian Hsu. Soat: A semi-automatic domain ontology acquisition tool from chinese corpus. In COLING, 2002.
[Wie96] Anna Wierzbicka. Semantics: Primes and Universals. Oxford Univer-sity Press, New York, 1996.
[WSL06] Ryen W. White, Hyunyoung Song, and Jay Liu. Concept maps to support oral history search and use. In Gary Marchionini, Michael L.
Nelson, and Catherine C. Marshall, editors, JCDL, pages 192–193.
ACM, 2006.
[ZS01] GuoDong Zhou and Jian Su. Named entity recognition using an hmm-based chunk tagger. In ACL ’02: Proceedings of the 40th Annual
Meeting on Association for Computational Linguistics, pages 473–480, Morristown, NJ, USA, 2001. Association for Computational Linguis-tics.