A generic framework for evaluating Adaptive Educational Hypermedia authoring systems

Evaluating MOT, AHA! and WHURLE to recommend on the development of AEH authoring systems

by N.J.C. Primus

A thesis submitted in partial fulfilment of the requirements for the degree of Master in Business Information Technology (BIT), University of Twente. March 10, 2005.

dr. Tanya Bondarouk, Department of Business Information Systems (BBT) Track of Electronic Business (first chairman of the board);
dr. ir. Bedir Tekinerdogan, Faculty of Electrical Engineering, Mathematics and Computer Science (EWI); multimedia software technology (second chairman of the board);
dr. ir. Rik Min, Faculty of Behaviour Sciences (GW) & Center of Telematics and Information Technology (CTIT) (project leader of the Adapt project);
dr. Italo De Diana, Faculty of Behaviour Sciences (GW) (member of the Adapt project; advisor).

Abstract

This report describes a research on the authoring of Adaptive Educational Hypermedia. The main research goal is to develop a generic evaluation framework, which makes it possible to assess AEH authoring applications. The framework is based on the concerns of the three different stakeholders involved in AEH authoring, which lead to three groups of evaluation criteria: technological, educational/conceptual and end-user related. Another group of results is derived from tests on MOT, AHA! and WHURLE; three AEH systems developed within the scope of the EU Minerva/ADAPT project. The evaluation framework is the result of a literature study on AEH research, quality factors and existing evaluation models. During the development process, the framework has been tested on MOT, AHA! and WHURLE. The genetic frameworkfor evaluating AEH authoring systems presented in this report, is both valuable and useful. However, the model is developed using only one test course, once on each system, by one researcher. Because of this, some criteria in the framework are, to a certain extent, considered questionable. These questionable criteria, and the single nature of the testings, are reasons for further research on the framework. MOT, AHA! and WHURLE proved to be suitable for authoring adaptive course materials. They belong to the front line of new educational software, an area already shifting rapidly from a supportive to an intelligent character. The tests show that AHA! is somewhat in front of the other two systems on certain points, and WHURLE somewhat beyond. Reasons are the longer lifespan of the AHA! system and WHURLE only offering a presentation function, and no proper module for course development. All in all, considering the fact that the three systems are still under development; MOT, AHA! and WHURLE will be 'good' AEH authoring systems eventually. The basic conceptual structures are solid and thought-out. However, the developers have to reconsider and perhaps extent the adaptive features, improve the general possibilities, such as the editors, and work on the user-friendliness of the systems.

Acknowledgments

No one will disagree that graduation time is about one of the strangest time spans of a man's live, I think. Well, what made mine so out of the ordinary? Is it the fact that I'm graduating at the university, while almost every Bit student graduates at a company? Or the fact that this is my second assignment, as the first one went up in smoke? Perhaps it's the fact that I had no first supervisor for the first three months. Or could it be the fact that I lingered in Enschede for so long now which makes it special? It could well be the fact that I, as a Bit student, had to deal with supervisors so diverse that it sometimes seemed impossible to unite all their concerns and comments, while at the same time our meetings were really interesting and at times even funny. Either way, I had a great time during the last months of my study. I accidentally came across the assignment on AEH authoring systems and the group of Behavioural Sciences one day and from the first moment on, Rik and I went on really well. The assignment attracted me very much, partly because of the enthusiasm Rik displayed. The first few months, when meetings with Rik and Italo were the order of the day, were mostly filled with installing and testing MOT, AHA! and WHURLE. I went to Eindhoven to meet with the developers of the systems. Together with Rik and Italo I went to the AH 2004 conference, also in Eindhoven, which were some remarkable days. Two of the fellow researchers (Alexandra from MOT and Craig from WHURLE) even came to Enschede one day to discuss the Minerva/ADAPT project and help me with installing some new test software.

Eventually, I also started working on my report and Bedir was of great assistance to keep things going here. After Tanya joined the project, she more than double made up for the time she was not around. The whole graduation project sped up and half a year later I am now presenting this report. Thanks go out to Alexandra for commenting on my work and to Craig, David and Natasha for assisting me with the software. I would like to thank Rik and Italo for letting me work with them on this assignment. It really was some experience to get to know these people, and not only because of the different views they displayed in comparison with the background of my own study. I would like to thank Bedir for his keen eye and structured approach all through the project, it certainly contributed to the overall quality of the research. I would like to thank Tanya for the way she guided me through the jungle a graduation process can sometimes be. Most of the time, I felt we were on the same track and our meetings were always fruitful.

Language issues in English were not a thing to worry about for me, as Grainne was really wonderful in assisting me with this. I'm very grateful to her for correcting my English writing the way she did. Writing this report in English was a good exercise for me personally, just as the fact that it is my first report ever edited completely in LATEX. Special thanks go to all the people around me, as my supervisors at the university definitely weren't the only ones suffering from the ups and downs during my graduation. The time has come to say thanks to you: my friends, my roommates, my family and to Judith, for being there for me when I needed you all, even when I wasn't there myself.

Niels Primus, March 2005

CHAPTER 1. Introduction

This report describes a study on the advancement of research on intelligent educational hypermedia systems. Over the past ten years the application of ICT in eduction has, as in any other field, increased explosively. In 2006, an estimated amount of over 25 billion US dollars will be spent on eLearning [46]. Well known commercial web-based educational systems are WebCT and Blackboard (Footnote here: www.webct.com and www.blackboard.com). Examples of systems which have their roots in the academical order are ELM-ART [19] and Interbook [20]. The focus of this kind of systems used to be on course management and the applications tended to be more or less a network of static pages. In fact, due to the statical nature of the educational software domain, the early commercial web-based educational systems are known for their well defined structure which only provides one learning path, suitable for an average student [3]. Of course this average student does not exist. For this reason, many researchers throughout the world are building new educational hypermedia systems or are adjusting 'old' ones. Contrary to most existing systems, these new applications offer personalisation. Examples of such systems are MOT, My Online Teacher [23], AHA!, Adaptive Hypermedia for All![9] and WHURLE, Web-based Hierarchical Reactive Learning Environment [61]. These systems try to imitate the real learning experience, which is formed by the challenge that teachers have to deal with the controversy between the learning materials on the one hand and the individual needs and characteristics of the students on the other. This relationship really well depicts the area of tension between the static and dynamic aspects of the learning process. Where the course materials mostly are a single book or a reader, each student has to be served individually in terms of its cognitive capabilities, perception, knowledge level, etcetera. Roughly said, there are three categories of eLearning [56]: eLearning with eBooks, eLearning by supportive systems and eLearning through real eLearning environments. This report focuses on the latter one, as adaptivity is only reached through applications that are able to fully comprehend the user and his characteristics and react to them accordingly. In The Netherlands, all the universities and most of the colleges have Electronic Learning Environments [5]. They are well aware of the fact that these systems are only of the first generation and are only capable of course management, and not content management. The latter is necessary to en-able educational applications evolve and leave the stage of being only supportive. One of the roads to personalisation of web-based systems lies in adaptivity, the feature of hypertext and hypermedia that allows one to adapt the contents to user needs [8]. The resulting systems are called Adaptive Educational Hypermedia (AEH). The terms AEH systems and AEH authoring systems are used alternately, as an AEH system (or environment) is nearly always by definition an authoring system. The user of an AEH application, usually a teacher or course developer, designs lessons through the application. This makes him instantly an author and, therefore, the system an authoring system. There are other kinds of authoring systems though; some AEH systems only have some authoring tools and some are merely for presenting lessons. Personalisation is a term not only familiar to the eLearning society. Especially in eCommerce, there has been a huge increase in the research on and the application of adaptive techniques. The reason being the enormous growth in online commerce and the need for individually tailored products and services to attract and retain consumers. The other form of adaptivity is adaptability, these two concepts are often confused. Whereas adaptability merely is the feature of a system to enable users to change the characteristics according to their preferences, adaptivity is a set of intelligent techniques with which a system adapts itself to the needs and preferences of a user. Adaptive systems make decisions on the basis of user information, either provided by the user himself or derived from input analysis. Adaptable systems, on the other hand, are not based on intelligent algorithms. This chapter gives some basic information on adaptive systems and educational hypermedia. Furthermore, some background information is given on the projects this study is part of. The research objectives, along with the goals, the research questions and the approach, are described and, finally, the structure of the report is given.

1.1 Problem statement

According to Cristea [22, 36, 27], Devedzic [43], Brusilovsky [15] and many others [50, 39, 7, 44, 64, 65] the question is not whether or not adaptivity in educational hypermedia is necessary, but in what form it should be added. In order to be able to answer this question, several researches have started. One of them is the Minerva/ADAPT project [35], funded by the European Union. Within the scope of this project, systems like MOT are developed to create a test environment for AEH authoring systems, so new techniques can be integrated and tested in a real life setting. This report stresses the importance of continuous research in the field of AEH and discusses several evaluations of AEH systems to ultimately offer recommendations to the community of researchers involved in the field. Devedzic [43] gives an analysis of the key issues in next-generation web based education. He refers to problems like the need for sharing and reuse of material, the proliferation of standards for communicating and the ability of end-users (teachers) to deal with ICT as the key challenges for the field. Others, like Cristea and Garzotto [36], accentuate the soundness of the design being the most important factor in AEH authoring.

The University of Twente participates in the already mentioned EU Min erva/ADAPT project through the research of the Faculty of Behavioural Sciences [57, 58]. The aim of the project is to evaluate the systems MOT, AHA! and WHURLE by building a multimedia (six dimensional) test course on a transistor circuit including an intelligent simulation. The goal is to find out what the performance of the generated products, built in MOT, AHA! and WHURLE, is. Two other goals are to find out how the characteristics of the test products match those of the original transistor product (that was built in HTML, contained a Java transistor applet and had no adaptive features) and how they match the characteristics specified by the three systems. They also want to investigate how building blocks -- discrete audio-visual components and standard elements from libraries -- can enrich open learning environments and make them more efficient and effective for all learning styles. Their project should finally generate some good and bad practices (and techniques), some quality factors for courseware authoring (and methods) and practice for adaptation over pedagogical considerations as well as learners cognitive profiles (learning or learner styles). Learner styles are of old little embroiled in research on AEH authoring ap plications. The main reason for this is that most current systems are developed by IT researchers and not by people with an educational background. The integration of learner styles in AEH systems is necessary in order to create real educational adaptive hypermedia. Examples of learner styles are the fielddependent and the field-independent style [72]. The first describes learners who prefer structures, social contents and materials related to their own experience. Field-independent learners perceive analytically, make concept distinctions and prefer impersonal orientation. Kolb [51] defines learner styles by introducing two axes: task (preferring to do or watch) and emotion (think or feel). Learner styles are established according to the position along the axes. Possibilities are activist (accommodator, do and feel: concrete-active); diverger (watch and do: concrete-reflective); theorist (assimilator, watch and think: abstract-reflective) and pragmatist (converger thinking and doing: abstract-active). In order to carry out their evaluation research, the group of Behavioural Sciences has planned to develop two test or demonstration courses. One is the fully adaptive six-dimensional transistor circuit course and the other one is a lighter, stripped version of this. One of the original research objectives of the ADAPT project is to build a prototype adaptive authoring tool. Another goal concerns evaluation and will look at the prototype from as many view angles as possible. With the prototype tool, the Twente members of the project were supposed to do their share of evaluation, being a combined product/process evaluation. However, instead of one adaptive prototype system, the project lead to several of such systems; namely MOT, AHA! and WHURLE. This report describes the research that filled the gap formed by the new challenge of having to evaluate several systems instead of one generic AEH authoring application. Instead of evaluating one generic application for authoring AEH, a generic evaluation framework is constructed so any AEH authoring system can be evaluated. As such, the study described in this report is embedded in the Twente sub project. It contributes to prototyping the three test examples in MOT, AHA! and WHURLE and evaluating them afterwards. The test products are valuable for the Faculty of Behavioural Sciences, because they can be used directly for evaluation and for studying the conceptual structure of the course material. This research also assists in the conclusions and recommendations the Twente report makes and supports them. Recapitulating, the main problem this report focuses on is the lack of evaluation frameworks for Adaptive Educational Hypermedia authoring applications.

1.2 Research Objectives

As already mentioned in paragraph 1.1 and described in Appendix A, the Minerva/ADAPT [35] project strives towards a "European platform of standards (guidelines, techniques and tools) for user modeling-based adaptability and adaptation". It hopes to reach this goal by generating new tools for authoring AEH, or Intelligent Tutoring Systems (ITS) as it sometimes is referred to. Three of these systems for authoring AEH are used in the research described in this report, they are MOT, AHA! and WHURLE. One of the ways to contribute to the development of standards is by designing an evaluation framework for AEH authoring applications. One of the objectives for the Twente sub project is a working multimedia (six-dimensional) transistor test lesson. Evaluating AEH is difficult because the underlying theories are either new or still under development, and there is no widespread agreement as to how the fundamental tasks (student modeling, adaptive pedagogical decision making, generation of instructional dialogues, etc.) should be performed [60]. The goal of this research is: To design a generic evaluation framework in order to assess AEH authoring applications. In AEH authoring applications, there are several stakeholders, all of which have different (sometimes overlapping) concerns. The generic nature of the framework is guaranteed by including the concerns of all stakeholders. The three different groups of stakeholders are in the educational or conceptual area (problem domain), in the technological area (solution domain) and in the end-user domain. The division in three domains is fixed: there is no superfluous domain and no domain is missing. In order to answer the concerns of the stakeholders, criteria have to be formulated. These criteria are to be found in literature on the subjects most closely related to the field of AEH authoring; literature on AEH, on software quality, on existing evaluation frameworks for (educational) AH, on other methods, and so on. The research goal results in the following research question: What are the criteria by which AEH authoring applications can be evaluated?

1.3 Approach

In order to be able to answer the main research question seven research subquestions are formulated. Every question is necessary for a small part of the research. Together, they are sufficient to gather enough information to answer the main question.

1. What is the current state of AEH authoring (evaluation) technology?
2. Which criteria on AEH authoring systems can be derived from a technological point of view?
3. Which criteria on AEH authoring systems can be derived from an educational point of view?
4. Which criteria on AEH authoring systems can be derived from an end-user point of view?
5. Which evaluation model can be given when considering the results of research subquestions 1 to 4?
6. Which new evaluation model can be designed when taking into account the shortcomings of the evaluation model of research subquestion 5?
7. Which recommendations can be given in the field of AEH when applying the generic evaluation model of research subquestion 6 to several AEH applications?

The research commences with a literature study on both the current status of AEH technology and the technological, educational and end-user criteria for evaluating AEH authoring systems in order to set up a preliminary evaluation framework. The aforementioned, combined with a review of literature on software quality and existing evaluation frameworks, discusses the first four research subquestions. The preliminary evaluation model, based on related work, is then updated, taking into account the shortcomings of the original evaluation model. The criteria that are selected for the framework have to be both necessary and sufficient, a selection method has to be formulated for this purpose. The result is a final and generic evaluation framework, applied using the transistor circuit course on MOT, AHA! and WHURLE resulting in recommendations in the field of AEH research. This covers research subquestion 7.

Figure 1.1: Research approach

The research approach in figure 1.1 shows how the different parts of the research are interconnected. As already stated, there is no standard or agreed evaluation framework for measuring the value and the effectiveness of adaptation yielded by AEH [69] The preliminary evaluation model for AEH authoring systems, partly coming forth of the literature study, is updated during several test cases on MOT, AHA! and WHURLE, using test course on transistors. As a result there is not only a generic model for the evaluation of AEH authoring systems, but there are evaluation results of the three systems as well. The generic nature of the final evaluation framework is guaranteed by the fact that it is based on the concerns of all stakeholders. The literature study, which gives a view on the complete field of AEH authoring, the existing evaluation frameworks and their shortcomings and the tests on the different experimental systems provide the basis for the criteria. Because this research is a practice-oriented design research [73], it is important that the problem is properly identified and defined. This has already been handled in the preceding sections. A possible pitfall in this kind of research is a lack of awareness of the origination of the main problem and the connections it has with other problem areas. In order to tackle this obstacle, the literature study forms an essential part of the research. It generates an exploration in the field of research and describes the backgrounds of the different areas. The research objective is described in the right form and contents. The form is: "Realising an assessment of the adaptivity in AEH authoring applications, by creating an evaluation framework" The rightness of the contents is established by the fulfillment of the four criteria usefulness, feasibility, distinctness and informativeness. The use is made clear in section 1.1. On the one hand this research presents a literature study and an evaluation tool, a valuable contribution to the field. On the other hand, this research assists in evaluating the three test systems of the Minerva project. As already stated, a possible danger is that the project grows too large. In order to remain feasible, clear borders must be set, as described in this chapter. The goal of the project is very specific, namely developing an evaluation framework. This chapter clearly describes the knowledge necessary to obtain results, making an informative research goal.

1.4 Structure of the report

The first four research subquestions concern the theoretical background of this research. A view on the status of current AEH technology, including a description of the three Minerva/ADAPT systems (MOT, AHA! and WHURLE) is presented in the next chapter. This includes a description of the issues in these research areas. The third chapter describes literature on software quality, several existing frameworks for AEH evaluation systems and other background on the evaluation of AEH authoring systems. The fourth chapter presents the criteria for evaluating AEH authoring systems, resulting from the literature study. A method for selecting the criteria is also described. A generic evaluation framework is presented. It is based on the shortcomings of the existing ones. The evaluation model is also developed by means of applying it to MOT, AHA! and WHURLE. These tests are discussed in chapter five. Finally conclusions are drawn and recommendations on the field of AEH are given in chapter six.

CHAPTER 2. AEH authoring systems

This chapter begins with a view on the status of Adaptive Educational Hypermedia technology, separated in sections on adaptive hypermedia, educational systems and AEH authoring applications. The subsequent section describes three examples of AEH authoring systems: MOT, AHA! and WHURLE. These are the systems developed under the umbrella of the Minerva/ADAPT project. This chapter concludes with some remarks in the research on AEH authoring systems. The remarks in the last section of this chapter result, logically, from the descriptions of the different fields of research and the three systems described in this chapter. They form a bridge to the fourth chapter, where criteria for evaluating AEH authoring systems are given. These criteria originate from the issues described in this chapter.

2.1 Authoring Adaptive Educational Hypermedia

The research on Adaptive Educational Hypermedia (AEH) has two pillars: adaptive hypermedia and educational systems. Adaptive hypermedia are applications best thought of as websites that automatically personalise themselves to users. A website is, in fact, an example of a hypermedia system. Educational systems are software applications specifically written for learning purposes. A few years ago, both educational and (adaptive) hypermedia software were presented on CD-ROM. The internet embraced these systems and they adapted itself to the new challenges and possibilities offered. The rise of the WWW also caused the research on AEH to grow explosively. According to Cristea and Garzotto [36] it is obvious that both adaptive hypermedia and educational systems have certain advantages over other systems. Traditionally, educational systems are often focused on course management and collaborative issues, however, they have to do without the personalisation needed for individual learning. The main feature of adaptive hypermedia that discerns it from normal hypermedia is the ability to serve each user individually. It performs a process of what is called 'adaptation', in most cases on the basis of user models. The combination of the two systems described forms the basis for the success of AEH applications, now and in the future [15]. There are mainly two types of users in any AEH application. The first one is the student, on him the research in this report is not applicable. The second type of user is the teacher or course designer: the author of the course. Authoring in this context means organising course materials and creating the intelligence that ensures the adaptiveness of the application.

2.1.1 Adaptive hypermedia

Adaptive hypermedia (AH) is a relatively new direction of research on the crossroads of hypermedia (formerly known as hypertext) and user modeling [15]. A hypermedia system consists of information items such as documents or animations connected to each other by means of so-called hyperlinks. These links between content items (called nodes) are an essential part of the hypermedia concept.

Figure 2.1: Adaptive Hypermedia

Most of the existing hypermedia are built in a certain way and presented that way to all different kinds of users. It is obvious that different users have different interests, different browsing styles and so on. So the more users who use a certain hypermedia system, the more need there is to make different 'views' for certain (groups of) users. The ideal situation would be to serve each user with content items and links specially prepared for him. This implies that the contents and the links available on the server should be prepared in such a way that differentiation is made possible and preferably kept simple. Of course, the latter only applies when it is useful to have a dedicated presentation for each user. A corporate website, for instance, is often initially a global information point for every user, either an individual or a company. Only when more information is requested, would such a site would offer specialised contents. The process of serving each user personalised contents in hypermedia systems is referred to as adaptation [8]. An adaptive system is a system capable of adapting itself to each user. It can create and update user models, in order to keep abreast of each user's progress. An adaptable system is tunable by the user itself. Most new AEH systems are both adaptable and adaptive. Initially, the user can set up some options himself. In a later stage, however, the system provides the adaptation automatically.

Brusilovsky [15] presents an extensive view of the taxonomy of adaptive hypermedia technologies in his landmark work Adaptive Hypermedia. The two main ways to create adaptivity are through adaptive navigation support and adaptive presentation. The methods and techniques giving shape to these technologies are first mentioned in his earlier work [14], which also gives an explanation of most of them. Figure 2.2 shows the taxonomy of adaptive hypermedia technologies as defined by Brusilovsky.

Figure 2.2: Taxonomy of adaptive hypermedia technologies [15]

Adaptive presentation is the first class of adaptation. On the basis of a user model, which is based on the user characteristics and his progression in the course, the contents of the page presented to the user alter. This can be done in various ways. Contents, or fragments of contents, can be conditionally included or removed, their order can be changed, a multimedia item can be added, a piece of text can be highlighted, dimmed or stretched out when hovering over it; all according to the state the user model is in. The purpose is to challenge qualified users by presenting more (profound) information and not stress beginners with knowledge that goes beyond their scope. Instead, beginning students, whose knowledge level on a subject does not comply to a certain standard, are presented more information. On the same time, experienced users are not both ered by a lot of detailed information they already possess. Natural language adaptation is more or less the same as canned text presentation, it uses some of the same techniques. On the moment of writing, the technique could not be classified further. Adaptation of modality is a high-level content adaptation technology that allows the creator of the courseware to choose between different types of media to present materials. In addition to traditional text, now also videos, applets or speech can be used to present information to the user. Adaptive navigation support is guiding the user through the course materials in a personalised way, which can even be changed during the course. In practice, this means links to other parts of the course can disappear or even be removed, or, quite the reverse, emerge. The order of the navigational links can be changed or the user can be guided through without even seeing, let alone clicking, a link. All of this adaptation is done according to the information gathered and processed in the user model. An example would be a course of which the outline is shown on a website and where the different parts become conditionally clickable, and thus viewable, as soon as the student finishes a preceding chapter. The most simple form of adaptive navigation support is direct guidance, which decides for the user the next 'best' node (piece of information). Adaptive link annotation is augmenting a link with some form of comment, that gives information on where the link points to. This can, for instance, be done by placing icons next to a link. Global and local maps can be altered to give the user an idea of his course domain, this alteration is called map adaptation. Generating new links splits up in three categories: discovering new useful links between documents and adding them permanently to the set of existing links; generating links for similarity-based navigation between items; and dynamic recommendation of relevant links. A possible danger in this process is the change of creating endless lists that ruin the view of the user. Since the start of AH research, educational hypermedia has been one of the largest areas of interest in the field of AH. Together with on-line information systems, they account for about two thirds of the research efforts in adaptive hypermedia [15]. Other areas in AH research include on-line help systems, information retrieval hypermedia (internet browsers and search engines), institutional hypermedia and systems for managing personalised views in information spaces.

2.1.2 Educational systems

The main purpose of any educational (computer) system has always been and will always be to support or even replace the classical human tutor [36]. Until recently, most educational systems could only offer supportive or collaborative tasks, like courseware management, calenders and email support. Examples of these systems are WebCT, Blackboard and TeleToP. As stated in chapter 1, the challenge for teachers lies in bridging the gap between the static domain of the course materials and the dynamical world of the student. The teacher has to fulfil the needs of each student individually, taking into account, among other things, the cognitive capabilities, the knowledge levels and the personality of the student. Any educational system in the future should deal with this challenge as well. (Footnote here: On-line information systems are, e. g., electronic encyclopedia, eCommerce systems and handheld guides)

The educational software world has a very clear understanding of the need for component-based programming in their field [68, 52]. Components are essential for moving towards a set of sharable and reusable course concepts. On the one hand, domain experts (teachers) can easily put together different building blocks of lesson material according to pedagogical strategies, thus avoiding technological issues and all the possible problems and difficulties. On the other hand, the technological experts do not have to bother about what material or what learning plan to use, because the component-based architecture makes these issues transparent for them. Educational software is different from other software in the sense that the educational world through the teachers hardly ever voluntarily introduce technology in their business. Most of the times IT implementation in schools is a top down management requirement. A side effect is that designers of educational (authoring) software have difficulties developing new applications, as they do not receive full support from the people they are designing for.

2.1.3 Adaptive educational hypermedia

As stated earlier, the question is not whether adaptivity should be added to educational systems, but in what form. Possible issues which arise with this question are what kind of adaptation should be applied, to what should it respond and when and how should it be introduced. In analogy with Cristea [22], who came up with the issues just mentioned, this report will get around the 'problem' of applying the adaptation by embracing the solution of authoring [27, 17, 16]. This shifts the pedagogical power to the people who should have it: the teachers and course creators. Thus, developing authoring tools for AEH is the main issue in (educational) AH at this time. One of the most important developments in AEH research has been the move from static to web-based hypermedia [15]. The internet provides both a challenging and an attractive platform for researchers to develop new AEH systems. One of the reasons for this is the infinitely larger set of resources the internet offers compared with a stand alone application. Furthermore, the internet is very famous for its struggle for standards. In order to create communication between systems and to render course materials reusable and shareable, the research community has at one point to agree upon certain standards.

2.2 MOT, AHA! and WHURLE

In this section the three systems for AEH authoring resulting from the research in the Minerva/ADAPT project are described. These are MOT, AHA! and WHURLE. Each system is given a brief introduction, after which the theory behind the system is explained. Brusilovsky's taxonomy mentioned in section 2.1.1 will be used to describe the adaptation techniques used in each of the systems.

2.2.1 MOT

MOT [23, 25, 31, 32] is an authoring system for adaptive educational hypermedia developed at the Eindhoven University of Technology. MOT descends from MyEnglishTeacher, or MyET, "a web-based, agent-based, long distance teaching environment for academic English" developed in 2000 in Japan by Cristea and others [39]. Since the start of the Minerva/ADAPT project, there have been many new versions of MOT, starting with MyET. One of the most important aspects the developers of MOT are trying to establish is the separation of content items (course materials) and adaptation rules within the system [38]. This guarantees the most efficient reuse of course materials and enables the application of AH adaptation rules. The basics of MOT MOT is based on the LAOS model [34], a 5-layer adaptive authoring model for adaptive hypermedia and LAG [29, 41], a 3-layer adaptation model. Both these models have been partly developed by the same research group that developed MOT [38]. The LAOS model introduces granularity in authoring levels, which supports the separation of information items and adaptation rules. This is necessary for the presentation of alternative contents. The LAG model provides granularity in adaptation rules and is integrated in the Adaptation Layer. MOT functions on the basis of concepts and lessons; information is stored in concepts, lessons can point to one or more concepts to publish this information. A concept can be seen as a book, whereas a lesson is a presentation of one or more books, or of parts of a book.

LAOS. The idea behind LAOS, Layered Adaptive hypermedia authoring model and its algebraic Operators, is to create a more flexible and powerful system for presenting information to different users and to let authors create more adaptive lessons, in a more adaptive environment [34]. Originally created as a 3-layer model [27], LAOS eventually became the 5-layer model of table 2.1. The main idea is to group the elements according to their possible usage, for later reuse [26].

Table 2.1: The origin of LAOS

The Domain Model (DM) originates from the Conceptual Layer (CL), containing both atomic and composite concepts. This first layer is where the actual course information is stored. As shown in figure 2.3, a concept can contain child concepts. Each (child-) concept contains at least one attribute called title and it can contain more attributes, like text, an introduction or self-defined attributes. Within these attributes it is possible to store HTML. This allows authors to add just plain text to a content item, but also tables, images, videos, applets, and so on.

The hierarchical structure of concepts is implemented by means of a separate concept hierarchy entity, relating a super-concept to one or more sub-concepts. The relationship of concepts is based on commonalities between concept attributes.

Figure 2.3: The conceptual structure of MOT

The Lesson Layer (LL) was replaced by the Goal and constraint Model (GM). The intermediate authoring step of adding the GM is necessary to build 'good' presentations; goals to give a focused presentation and constraints to limit the search space. In MOT, the goals and constraints are given by lesson constructions. A lesson contains sub-lessons which, in turn, are lessons, hence creating a hierarchical structure of lessons. A lesson attribute contains one or more concept attributes. This is the link with the concept domain. The idea is that the lesson combines pieces of information that are stored in the concept attributes in a suitable way for presentation to a student. One of the lesson attributes contains a holder which contains the actual sub-lessons in a specified order.

The Student Adaptation and Presentation Layer (SAPL) is extended to a User Model (UM), an Adaptation Model (AM) and a Presentation Model (PM). The UM is designed in conformity with the DM and the GM and thus contains concepts and relationships between concepts. In MOT, the user model does not necessarily need to be an overlay of the domain model. The reason being that other relationships between variables in the user model can exist next to just those between the concepts matching those of the domain model. The adaptation model is implemented through the the LAG model, discussed in the next section. The presentation model must take into account the physical proprieties and the environment of the presentation and provide the bridge to the actual code generation for the different platforms.

In the ideal situation, all the layers in the old 3-layer as well as in the LAOS model should be supported by the Adaptive Engine (AE). The higher the degree of support offered by the AE, the more adaptivity there can be applied. As stated, the LAOS model is used to separate the levels of specification. To support this, MOT functions on basis of a MySQL database, whose table separation scheme instantiates the logical separation of the different levels in the LAOS model. On the contrary, the XML language does not provide the solid data representation a database offers. However, XML is good for representing hierarchical data, without many cross-relations. Because of this, and because MOT aims at offering course materials to other AEH systems, the actual output for the adaptation engine is in HTML format. In the specific case MOT transfers course materials to AHA!, XML is generated.

LAG. The functionality of the adaptation model was improved by integrating a 3-layer model into LAOS, called LAG (three Layers of Adaptation Granularity) [38, 29]. The idea of the three layers is that the medium and the high level function as a wrapper for respectively the low and the medium level; it presents the lower level functions in a more simple way to the layer above. The lowest level contains direct adaptation rules, like IF-THEN statements or condition-action (CA) rules. The medium level consists of an adaptation language; this is a kind of programming language which, after compilation, outputs a set of adaptation rules. The highest level deals with adaptation strategies; these are a sort of function calls for the adaptation language and can represent pedagogical or cognitive styles.
As MOT is meant to be a system for the (adaptive) authoring of adaptive educational hypermedia systems, the presentation layer (PM) is not really worked out. There is a Student View, but this really is nothing more than a fixed listing of all the concepts, where all adaptation already has been applied. The idea behind MOT is to present a user-friendly authoring system for creating lessons, which can be transferred to other adaptive educational (delivery) systems such as AHA! or WHURLE.
Figure 2.4 shows the integration of LAG in LAOS, LAOS in MOT and MOT in AHA!. The latter is one the latest results in the ADAPT project: the coupling between MOT and other systems.

Figure 2.4: LAG, LAOS, AHA! and MOT [38]

In the class diagram of MOT (Figure 2.5) we see the splitting of concepts (Domain) and lessons (Course). The most interesting link is the connection between concept attribute and sub-lesson. Each attribute is used in one sub lesson and every sub lesson uses zero or one attributes for its contents.

Figure 2.5: Class diagram of MOT [31]

Adaptation techniques in MOT

The adaptive hypermedia methods and techniques present in MOT [25] (see figure 2.6) can be found in Brusilovsky's taxonomy as described in section 2.1.1. The methods and techniques are either present or not, there is no way to partially implement a certain technique. If a certain method or technique is present in MOT, it is printed in bold face in the taxonomy. They have been identified in cooperation with the developers of the system.

Figure 2.6: Adaptation techniques in MOT

2.2.2 AHA!

The AHA! system, or "Adaptive Hypermedia Architecture", was designed and implemented at the Eindhoven University of Technology, and sponsored by the NLnet Foundation through the AHA! project, or Adaptive Hypermedia for All (Footnote here: The information in this section is mainly derived from [6] and [11]). AHA! is an open source general purpose adaptive hypermedia system, through which very different adaptive applications can be created. AHA! was originally developed by De Bra and others in 1996/1997 to support an on-line course with some user guidance through conditional (extra) explanations and conditional link hiding.

Adding adaptivity in AHA! is normally achieved by creating concepts and concept relationships. This is the way best supported by AHA!'s authoring tools. Adaptivity can, however, be added through other means, by low-level and/or advanced features. An example of a low-level authoring tool is the Concept Editor, which makes it possible for an author to add adaptation rules by hand.

The Concept Editor and other authoring tools are discussed later on in this section.

The basics of AHA!

For the most part AHA! works as a web server. Users request pages by clicking on links in a browser. AHA! delivers the pages that correspond to these links. However, in order to generate these pages AHA! uses three types of information: the domain model, the user model and the adaptation model.

The domain model (DM) contains a conceptual description of the applications contents. It consists of concepts and concept relationships. Concepts can be used to represent topics of the application domain, for instance, subjects to be studied in a course. In AHA!, every page that can be presented to the end-user must have a corresponding concept. A concept though, does not necessarily need a corresponding page. Concepts are linked to each other by relationships; one concept can, for instance, be a prerequisite for another concept. In AHA!, prerequisite relationships, which are predefined, result in changes in the presentation of hypertext link anchors. Every concept, except for the uppermost parent concept, has a parent concept. All concepts can contain child concepts and can have siblings.

Figure 2.7: Class diagram of concepts in AHA!

The adaptivity in AHA! is based on concepts stored in the User Model (UM). The UM is updated each time a user visits a page, which is related to a concept. A concept contains several attributes (and attribute values) such as, for example, a knowledge level indicator. These attributes (their values) can be updated at a visit and then be propagated to attributes of other concepts. In this way, more, or other, information (stored in other concepts) becomes available. The UM is an overlay model, which means that for every concept in the DM there is a concept in the UM. Besides this, the UM can contain additional concepts that have no meaning in the DM.

The UM always contains an adaptable concept called personal, which contains attributes describing the user. Each user can personally adapt the values of the attributes stored in the personal concept. Obvious attributes are login information and addresses, more interesting attributes are, for instance, knowledge level indicators of a certain research topic.

The adaptation model (AM) is what drives the adaptation engine. It defines how user actions are translated into user model updates and into the generation of an adapted presentation of a requested page. The AM consists of adaptation rules that are actually event-condition-action rules. Figure 2.8 shows the architecture of AHA! and the three models. As can be seen, Java servlets interact with the combined domain/adaptation model and with the user model, in order to carry out the adaptation rules that perform the updates of the user model.

Figure 2.8: The architecture of AHA! [6]

Whenever a concept (or a concept corresponding to a page) is requested, AHA! starts by executing the rules associated with the access attribute (Footnote here: The access attribute is a system-defined attribute that is used specifically for the purpose of starting the rule execution. ). After this, all the other rules are executed and the attributes of the associated concepts are updated accordingly. Now AHA! starts processing the requested page. This involves the conditionally including of fragments or objects, the hiding or annotation of links and, of course, passing all other contents to the browser. The result is a unique page, suited for a single user.

The AHA! system uses either XML files or a MySQL database to store the domain, the adaptation and all the user models. Since AHA! is based on TomCat servlet technology, it should, in theory, be possible to use it with any Java-enabled platform, on any Java-based (TomCat) web server. As said before, AHA! offers several authoring tools for adding adaptation to selected contents.

The Concept Editor is a graphical, Java applet based tool to define concepts and adaptation rules. It uses an (author defined) template to associate a predefined set of attributes and adaptation rules with each newly created concept. It is a low-level tool in the sense that all adaptation rules between concepts must be defined by the author. Many applications have a number of constructs that appear frequently, e. g., the knowledge propagation from page to section to chapter, or the existence of prerequisite relationships. This leads to a lot of repetitive work for the author.

The Graph Editor is also a graphical, Java applet based tool, but it uses high-level concept relationships. Again, when concepts are created, a set of attributes and adaptation rules is generated. But this tool also has templates for different types of concept relationships (also defined by the author). Creating knowledge propagation, prerequisite relationships or any other relationship is just a matter of drawing a graph structure using this graphical tool. The translation from high-level constructs to the low-level adaptation rules is done automatically, based on the templates.

The Form Editor let authors create forms, which are used for changeable attributes to be included in an (X)HTML presentation. The end-user is then able to adapt the values of these attributes through the form accompanying a certain application.

Furthermore, there is a module available for adding multiple choice tests to an application. This lets the system automatically select questions and answers to present. The Layout Manager provides authors with the possibility to change the look & feel of a course.

Instead of creating AHA!-specific tools it is possible to let authors develop applications for other adaptive hypermedia systems and translate them to AHA!. Given the fact that AHA! offers a lot of (low-level) functionality this should be possible for many systems.

Adaptation techniques in AHA!

The adaptive hypermedia methods and techniques present in AHA! (see figure 2.9) can be found in Brusilovsky's taxonomy as described in section 2.1.1. As with the taxonomy of those used in MOT, they are boldfaced. All the methods and techniques that are used are pointed out and explained by the developers in literatur provided by them [6].

Figure 2.9: Adaptation techniques in AHA!

2.2.3 WHURLE

WHURLE [Footnote 3] (Web-based Hierarchical Universal Reactive Learning Environment) [61] is designed around 2001 at the University of Nottingham to provide a discipline-independent framework that manages easily reusable contents. WHURLE is pedagogically flexible and has the capability of implementing adaptation. An important feature of this framework is that it distinguishes between three types of authors of educational hypermedia � subject experts, teachers and technical authors. The subject experts are responsible for the contents, the teachers for the implementation of that contents and the technical authors for the user model of the adaptation and the appearance and behaviour of the system.

The basics of WHURLE

The student is presented a lesson, consisting of several atomic units called chunks, which are the smallest possible conceptually self-contained units of information that can be used by the system. This can be a text paragraph, a picture, a simulation or even a complete book (which would, of course, not correspond to the purpose of a chunk). Besides chunks, lessons contain a lesson plan, which is a default pathway through the chunks. The lesson plan is filtered by the adaptation filter that implements the user model based upon data stored in the student's user profile.

Figure 2.10 shows the modular system architecture of the WHURLE framework. A composite node tree is constructed from all of the specified chunks, the links and the lesson plan. This is then processed by the adaptation filter (an XSLT stylesheet) which implements the user model. The output document is then processed by the display engine (another XSLT stylesheet) which overlays the skin (the cosmetic appearance) and generates the auto navigation system. The output document of the display engine is the virtual document served to the user. It consists of dynamic HTML and should, therefore, be capable of being displayed in most web browsers.

This architecture allows the adaptation filter, display engine and skin to be modified independently of each other � a facility invaluable both for research and implementation, as now these aspects can be studied and implemented separately. Thus, WHURLE is not tied to any particular user model or interface, both of which can be created by technical authors.

Figure 2.10: The architecture of WHURLE [12]

WHURLE contains three distinct linking systems, serving distinct purposes. The first category are the intra-chunk links (i. e., links within individual chunks) that are created by the original chunk author and form a part of the chunk. An example would be a chunk containing a question along with a link to another part of the chunk or even to another chunk containing the answer. Secondly, systematic links are automatically generated by the system to provide navigational facilities, based upon the structure of the lesson plan, and the user interface of the learning environment, based upon definitions in the skin. Finally, authored links are manually created by teachers or students and are stored in one or more link bases, separate from the contents. They can point to any available resource; to another chunk in WHURLE or to anything on the web. WHURLE is an XML system currently implemented using XSLT that is processed on the server, delivering dynamic HTML. In order to minimise the processing overhead XInclude is used to retrieve chunks as they are required. XSLT and XInclude are processed using the Cocoon XML publishing framework. The user profiles are stored in a MySQL database, chunks and lesson plans are XML files and configuration and navigation information is specified as request parameters of the URI.

Adaptation techniques in WHURLE

The adaptive hypermedia methods and techniques present in WHURLE (see figure 2.11) can be found in Brusilovsky's taxonomy as described in section 2.1.1. The developers of WHURLE describe the methods and techniques they use for adaptation in several publication [12, 62].

Figure 2.11: Adaptation techniques in WHURLE

Footnote 4: The information in this section is mainly derived from [13, 12, 62] and [71]

2.2.4 Summary

Having discussed the three systems MOT, AHA! and WHURLE, this section continues with an overview of the adaptation methods and techniques used within the three systems. Table 2.2 shows the adaptation techniques as defined by Brusilovsky (see section 2.1.1). MOT and AHA! together cover almost all the adaptation techniques, WHURLE functions in this sense as a sort of reference system. When analysing these systems, no consideration is given to the fact that systems might use each others contents and techniques. These aspects of interoperability and such are treated in the following chapter, here only aspects typical to the systems itself are analysed. Brusilovsky's 2001 taxonomy of adaptation techniques [15] covers all possible ways to express adaptivity in an application for adaptive (educational) hypermedia. Every possible method or technique used in either MOT, AHA! or WHURLE fits in the formulated classification. Not every adaptive technique has to be used in an AEH system to be a good system and having implemented all techniques does not lead to being the 'best' system. It just ensures a more complete system, as it is able to present the same contents and the same level of adaptation in more ways. Many of the techniques mentioned are simply instances of the same 'mother' technique. Natural language adaptation, for instance, is just the same as canned text adaptation, except for the fact that the information chunks are spoken instead of written. It should not cost that much effort to implement spoken language in an existing AEH application, as the adaptation rules are already defined and the only issue is to update the links that lead to the correct chunks; they can now also lead to other chunks. As can be seen in table 2.2, the techniques most commonly used are inserting, removing and sorting of fragments and adaptive link hiding. The first three are presentation techniques; bits and pieces of text, visuals, sound fragments and such are presented to the user according to the state of, among other things, the user model and the adaptation model. They can be in a different order for different students, there can be extra information shown, in case a student's knowledge level does not match a certain limit or contents can be removed from the presentation completely, if a student is already familiar with the information. In the same way the hiding, disabling and removal of links to other content items, examples of adaptive navigation support, takes place. If, for instance, the system concludes that a student's knowledge or his abilities do not comply with a certain level, it presents links to parts of the course materials that can help the student to improve his knowledge. The techniques discussed in the last paragraph are the most simple, or standard, adaptation techniques. This is the reason for the fact most developers use them as basic techniques of their adaptive system. One could argue it is almost impossible to develop an AEH application without incorporating these techniques. There are certain adaptation techniques that are not often used, they make systems like MOT, AHA! and WHURLE unique. The usage, or just absence, of certain techniques could say something on the 'quality' of a system. However, statements made in this chapter are only preliminary. The real evaluation is discussed in later chapters. Adaptive multimedia presentation is a technique only adopted by MOT, through which teachers are enabled to include multimedia objects in their lessons, instead of just text. A premature statement would be that AHA! and WHURLE are more complete, offering the possibility of using more dimensions. The same applies to natural language adaptation. As stated before, neither one of the systems uses this technique. The absence of alteration and dimming of fragments, just like stretchtext, do not seem to be major limitations to systems which fail to incorporate them. The reason being that each system offers a lot of other adaptation techniques to present course materials. Adaptation of modality, however, a feature only adopted by MOT, allows the teacher to let the system choose between different dimensions of information chunks to present a concept. For some students, a picture tells more than a thousand words, while for other students this just might not be the case. This could be an issue on which both AHA! and WHURLE are up for improvement.

Table 2.2: Overview of the adaptation techniques used in MOT, AHA! and WHURLE.

The fact that WHURLE does not use direct guidance as a technique to lead students through the lessons, could be a drawback. Just like adaptive link sorting, which seems to be a very good way to express the importance of certain course materials. Exclusively AHA! uses adaptive link annotation and map adaptation, two forms of adaptive navigation support. The first one seems to be just another way to do the same trick, so then it would do no harm to ignore it. Usually however, it is thought better to give the student as many signals as possible, to ease the process. In this sense, it would be a gain to annotate the usefulness of links with, for instance, (colored) icons. Map adaptation gives the impression of a quite valuable method to give the student some idea of his position in the course. Dynamic, or adaptive link generation is at first sight a rather useful method to expand the collection of links, so it could well be an advantage for AHA!.

2.3 Remarks on AEH authoring

This chapter has so far given a view on the research on authoring systems for adaptive educational hypermedia. It has also described the three systems MOT, AHA! and WHURLE, which form the offspring of the research carried out within the Minerva/ADAPT project. This section presents the issues that result from the research on AEH authoring systems. As stated before, according to Devedzic [43] there are conceptual, technological and tools-related issues. On a conceptual level, course materials created in an AEH system should be made reusable and should be available to other AEH systems. The problem is that most such systems today use different formats, languages and vocabularies for representing and storing the course materials, as well as the teaching strategies they apply, the assessment procedures and the student models. Hence there is generally no way for two different educational applications to interoperate and share their teaching and learning contents, even if they are in the same domain. An issue of technological order is the slow acceptance of new technologies. XML/RDF languages, for instance, provide a more-or-less standardised syntax, offer a high degree of interoperability between applications and are widely used. The educational world, though, still mainly uses HTML to create educational hypermedia, HTML does not have the functionalities mentioned. The third order of issues Deved zi� c signals is about the incapability of teachers to deal with the tools provided. It is difficult for authors who are not experts in web page design to create pages of a web-based educational application by using current authoring tools, such as TopClass, WebCT or Authorware. Known problems in the field of adaptive hypermedia are how to adapt, when to adapt, what to adapt, to whom to adapt and what kind of adaptation to apply. Furthermore, AH researchers complain about having no challenging contents and structures large enough for the field. The educational software world, on the other hand, always felt the lack of personalisation in the existing educational systems. Another issue in this field is the match between educational components and virtual concepts. An issue in the field of AEH research has always been the question whether or not to make a system suitable for the internet. Garzotto and Cristea [36] state the main issue of AEH lies in the design. The proper design, they claim, is the first step towards common design patterns, leading to a better, semantically enhanced authoring system for AEH. This is necessary in order to overcome the obstacle between the educational specialist (the teacher or course developer) and his often limited knowledge of the environment and hence difficulties to deliver his knowledge.

In chapter 4 the issues described here are translated into criteria for evaluating AEH authoring systems.

CHAPTER 3. Quality assurance in AEH design

The question which immediately rises when discussing the evaluation of AEH or any other software application is: "What is (software) quality?" [67]. As there is no common answer to this question, this chapter establishes the boundaries within which the evaluation is carried out. Crosby [42] nonetheless answers the question posed before as follows:

The problem of quality management is not what people don't know about it. The problem is what they think they do know. . .In this regard, quality has much in common with sex. Everybody is up for it. (Under certain conditions, of course.) Everyone feels they understand it. (Even though they wouldn't want to explain it.) Everyone thinks execution is only a matter of following natural inclinations. (After all, we do get along somehow.) Most people, of course, feel that problems in these areas are caused by other people. (If only they would take the time to do things right.)

This chapter approaches the quality question from two sides. First of all is described what software quality is, as an AEH system is, of course, a piece of software. At the same time software quality aspects which can be used to evaluate AEH applications are marked, so they can be taken along to the evaluation criteria described in chapter 4. The second section describes existing frameworks for the evaluation of (educational) adaptive hypermedia systems. This section mentions some other evaluation methods as well, which also provide issues in AEH authoring that can be turned into criteria later on. A distinction has to be made between the assessment and the evaluation of adaptive educational hypermedia [47]. Evaluation is done from a system point of view and focuses on system performance and the systems decision-making capabilities. Evaluating means checking the adaptive decisions made by the system, the efficiency and performance of the system and/or the algorithms the system employs. The term assessment refers to a learner centered approach to system evaluation. The effectiveness of the system is measured by looking at learner outcomes. In both evaluation and assessment, the primary goal is to determine the effectiveness of the adaptive educational hypermedia system. As stated earlier, this research describes an evaluation on design aspects in particular. Learner aspects are not taken along. When a user is mentioned, a teacher, or course developer, is meant. Where chapter 2 describes the methods and techniques used to introduce adaptivity in AEH authoring systems, this chapter will chart the underlying technologies and conceptual ideas that enable this adaptivity. The adaptation techniques as described by Brusilovsky [15] are the reflection of what is done internally by the system, the user takes no notice of these internal processes and only sees the effects through the adaptation techniques used (Footnote here: The user in this research is the teacher or the course developer, though it could apply to students as well ).

These methods and techniques for adaptive navigation and adaptive presentation support are all in the user domain. They can be seen as the front-office. This in contrary to the the issues and criteria mentioned in this chapter, which are in the architectural domain and in analogy best regarded as the back-office of the system. As already mentioned in chapter 1, there is a need for empirical evaluations of adaptive educational hypermedia systems. The reason for this is that only few of such studies exist. Moreover, the ones that exist are of a rather simple nature and have a small sample size [74]. The former is covered in this research by trying to define a full range of criteria, to cover the concerns of all stakeholders. The latter is not, as that would go beyond the scope of this research, i. e., it would exceed the research parameters. However, evaluations were carried out on MOT, AHA! and WHURLE. The limitation of these evaluations arise from the fact that the evaluations are not carried out on a large number of test persons. All of this is explained and worked out in chapter 5. Another form of evaluation is formal testing which, for instance, focuses on the correctness of algorithms. This kind of research is not used in this study. This chapter presents the basis on which the final framework is built. The reason for selecting the software quality and existing frameworks for AH evaluation as foundation has multiple sides. First, there are no evaluation models for AEH authoring applications. The existing literature on this subjects only sums up some issues to consider. To deal with this situation, other perspectives are selected, on the basis of relatedness to the subject. In this respect, software quality and AH systems in general are the most closely related subjects. One could argue educational systems have to be studied to a larger extent than has been done in chapter 2, but the research described here focuses on design aspects, and not on pure educational aspects, such as learner styles.

3.1 Software quality factors

Regarding software engineering, Pressman [67] wrote a landmark work in which he defines software quality as the conformance to the three measurable aspects below. The nature of his work makes this information most valuable. 1. Software requirements; the foundation to measure quality. 2. Specified standards; resulting in a set of development criteria. 3. Implicit requirements; such as maintainability. Although these three aspects cover the whole range of software quality, the concept remains rather vague. This section makes the definition of software quality more concrete by identifying several factors that affect software quality. These factors can be measured, either directly or indirectly, and are, as such, useful for the development of an evaluation framework. The factors are derived from the three categories mentioned above, together they are a decomposition of the three aspects. As they are selected by Pressman [67] in his landmark work on software engineering, these factors are considered valuable.

3.1.1 McCall's quality factors

McCall and his colleagues [54] define three categories of factors that affect software quality: operational characteristics, the ability to undergo change and the adaptability to new environments. These three categories each contain several quality factors, which are described below.

correctness

reliability The extent to which a program can be expected to perform its intended function with required precision.

efficiency The amount of computing resources and code required by a program to perform its function.

integrity The extent to which access to software or data by unauthorised persons can be controlled.

usability The effort required to learn, operate, prepare input and interpret output of a program.

maintainability The effort required to locate and fix an error in a program.

flexibility The effort required to modify an operational program.

testability The effort required to test a program to ensure that it performs its intended function.

portability The effort required to transfer the program from one hardware and/or software system environment to another.

reusability The extent to which a program (or parts of a program) can be reused in other applications -related to the packaging and scope of the function that the program performs.

interoperability The effort required to link one system to another.

As it is difficult, and sometimes even impossible, to directly measure quality factors, McCall defined several metrics in order to grade each factor. The quality factors and the metrics are combined in a checklist, whereby every factor is assigned several metrics. The metrics are listed below:

audibility	error tolerance	operability
accuracy	execution efficiency	security
communication commonality	expandability	self-documentation
completeness	generality	simplicity
conciseness	hardware independence	software system independence
consistency	instrumentation	traceability
data commonality	modularity	training

When adopting the model described above, each metric is given a weight, varying between 0 (low) and 10 (high). On the basis of these measurements, evaluation is performed. This scale is obviously arbitrary as well as dependent on local products and concerns.

3.1.2 Hewlett-Packard's quality factors

The Hewlett-Packard company [48] listed some quality factors in a set called FURPS, which is an acronym for the factors it contains.

Quality factor	Metrics
Functionality	feature set, capabilities of the system, generality of the delivered functions, security of the overall system
Usability	human factors, overall aesthetics, consistency, documentation
Reliability	frequency and severity of failure, accuracy of output results, mean time between failures (MTBF), ability to recover from failure, predictability of the program
Performance	processing speed, response time, resource consumption, throughput, efficiency
Supportability	maintainability (extensibility, adaptability and serviceability), testability, compatibility, configurability, ease of installation, ease of problem localisation

The ease of installation metric in particular is interesting, as it does not occur often in evaluation lists though it has been a factor in the research described in this report.

The models in this section are distinct from the existing evaluation frameworks discussed in the following section. A framework extends over several points of view, whereas these models, which contain software quality factors, only cover pure software related aspects. These are separated from other evaluation methods and quality issues on purpose, as they directly descend from the concept of software quality, introduced at the beginning of this section.

3.2 Existing evaluation frameworks

On the evaluation of regular adaptive hypermedia systems, little research has been carried out. Several frameworks were constructed and some of the most recent ones are discussed here. As they are not specifically designed to evaluate educational AH systems, not all quality aspects can be used in this research. The two evaluation frameworks for the evaluation of adaptive educational hypermedia first mentioned in this section both follow a layered approach, which splits the system under evaluation in to critical parts, i. e., it separates the input acquisition from the adaptation decision process. Furthermore, both are of empirical nature. As already stated, this is the kind of research currently most needed. Another reason for selecting these two frameworks is the recent nature of them. They both present an up-to-date view on the evaluation of (educational) adaptive hypermedia.

3.2.1 The framework of Gupta & Grover

Gupta and Grover present a framework for evaluating AH systems [49] in which they elaborate on the layered approach of AH evaluation. According to them, a layered framework for evaluation offers the best identification of adaptation failures and other errors. The framework they suggest consists of four dimensions -environment, adaptation, development process and evaluation modules. These four dimensions are orthogonal to each other, i. e., all the evaluation modules should address all the components of environment and adaptation during each phase of the development process. Figure 3.1 shows the framework as suggested by Gupta and Grover.

Figure 3.1: Gupta & Grover [49], Proposed Evaluation Framework for AHS

Not all the criteria formulated by Gupta & Grover are used in this research. The following is a short description of the subjects which are used in the research.

The environment dimension handles on everything the system has to adapt itself to, like location, device, and so on. As this research is about evaluating educational AH systems in a research environment, these characteristics are stable and do not need to be taken into consideration. The adaptation dimension only has two possible types: static and dynamic adaptation, depending upon the time and process of adaptation. Static adaptations are specified by the author at the moment of design or determined only once at the startup of the application. Dynamic adaptation occurs during runtime depending on various factors such as inputs given by the users during use, changes in user model, adaptation decision taken by AHS etc. An example of a dynamical adaptation system is AHA!. The evaluation modules dimension comes from the research on layered frameworks as proposed by [18] and [66]. Gupta and Grover evaluate them with respect to the other dimensions of the framework. In this layer, the input acquisition has to be checked for correctness, i. e., reliability, accuracy, precision, etc. Along with this, an evaluation has to be done on the interpretations the system makes of the inputs given, for both static and dynamic adaptations. On the basis of this input, semantic conclusions are made by the adaptive system, the so called inferences drawn. These have to be evaluated. Based on the inferences drawn, each system uses several models, such as domain, user , adaptation and presentation model. These models are necessary for achieving the required adaptation. They are supposed to imitate the real world. They need to be evaluated for validity, i. e., correct representation of the entity being modeled, comprehensiveness of the model, redundancy of the model, precision of the model and sensitivity of the modeling process. Evaluation of the adaptation decision is done to check whether the system, given a set of properties in the user model, follows the most optimal adaptation, when there is more than one adaptation possible. Criteria used are necessity, appropriateness and acceptance of adaptation. The evaluator should check that more adaptivity does not decrease usability. The presentation is evaluated on basis of criteria such as completeness, coherence, timeliness of adaptation and user control over adaptation. The development process dimension takes care of evaluating the software life cycle, i. e., analysis, design, implementation and maintenance. Because the three systems described in this report already exist, this dimension can be transformed to a check between the initial and the achieved goals of the systems, in respect to the other three dimensions.

The main objective of Gupta & Grover when developing their framework was to present a method to evaluate an AEH authoring application in its complete context. This means the application is not only observed on its adaptive capabilities, but that the evaluation framework integrates the AH development process, the accessing environment, the different levels and types of adaptations involved and the evaluation modules of layered frameworks. Of course, not all these aspects are taken into account in the research described in this report.

3.2.2 The framework of Weibelzahl

Weibelzahl [75, 74] constructed a framework for the evaluation of adaptive systems in his Ph.D. dissertation. He elaborated on the state of affairs in evaluation frameworks for adaptive systems (not necessarily of educational order) and embroiled the work of about forty researches in developing his own framework. In doing so, he took into account the layered approach of adaptive systems evaluation, first described by Brusilovsky, Karagiannidis and Sampson [18]. The evaluation framework of Weibelzahl is claimed to be 'more complete' than frameworks developed before. It distinguishes, for instance, input data assessment vs. inference of user properties, a relevant distinction that should not be overlooked. In developing the framework, Weibelzahl had two objectives in mind. First, he observed the need for specifying what has to be evaluated to guarantee the success of adaptive systems. The second objective he pursued was to have a grid that facilitates the specification of criteria and methods that are useful for the evaluation. The framework focuses on empirical evaluations and not on formal methods, such as verification of algorithms. Weibelzahl states that when evaluating the real world value of an adaptive system an empirical approach is inevitable. Summarising, he proposes "a systematic approach for the evaluation of adaptive systems that will encourage and categorise future evaluations".

The framework, displayed in figure 3.2, distinguishes four evaluation steps, which can all together be seen as an instance of the layered approach of [18], as each step is a prerequisite to the following steps:

1. Evaluation of reliability and external validity of input data acquisition
2. Evaluation of the inference mechanism and accuracy of user properties
3. Evaluation of adaptation decisions
4. Evaluation of total interaction
- 4.1 System behaviour
- 4.2 User behaviour and usability

Figure 3.2: Weibelzahl [75], Evaluation Framework for AHS

Figure 3.2 shows the architecture of an adaptive system in combination with the information flows between the components of the systems. The numbered bullets refer to the four evaluation steps mentioned before. The layered nature of the framework follows from the fact that the evaluation of all previous steps is prerequisite to the current step. For instance, the system probably makes the wrong adaptation decision if it is arguing on the basis of incorrect user information. The four evaluation steps are described below.

To build a user model the system acquires direct or indirect input from the user (e. g., appearance of specific behaviour, utterances, answers, etc.). These data are the basis of all further inferences. Thus, its reliability and validity are of high importance. This applies to external validity as well. For instance, in adaptive learning systems, visited pages are usually treated as read and sometimes even as known. However, users might have scanned the page only shortly for specific information, without paying attention to other parts of the page. Relying on such input data might also cause maladaptations. Based on the input, properties of the user are inferred. The inference itself is derived in many different ways ranging from simple rule based algorithms to Bayesian Networks or Cased-Based Reasoning systems. Similar to the first step the validity of the inference can be evaluated, too. In fact, this means to check whether the inferred user properties really do exist. During the so called downward inference, the system decides how to adapt the interface. Usually there are several possibilities of adaptation given the same user properties. The aim of this evaluation step is to figure out whether the chosen adaptation decision is the optimal one, given that the user properties have been inferred correctly. The last step evaluates the total interaction by observing the system behaviour and the user behaviour, i. e.,, the usability and the performance. Several dimensions of the system behaviour may be evaluated. The most important is probably the frequency of adaptation. Moreover, the frequency of certain adaptation types is important, too. The user's behaviour can be evaluated separately and is, in fact, the most important part. The adaptation is successful only if the user has reached his goal and if he is satisfied with the interaction. This final step has to assess both task success (respectively performance) and usability. For some systems the performance of the users in terms of efficiency and effectiveness is crucial. For example, the success of an adaptive learning system depends on the users learning gain (besides other criteria). The evaluation framework developed by Weibelzahl [75] appears to fit effortlessly into the framework of Gupta & Grover [49]. The four steps Weibelzahl mentions roughly match the evaluation modules described by Gupta & Grover. However, Weibelzahl stresses the importance of a really thorough investigation on evaluation criteria. His profound research [74] gives a comprehensive view on existing evaluation methods and criteria. The results of that study are included in this research.

3.2.3 Other evaluation methods

The two frameworks described before both follow the layered approach as developed by Karagiannidis et. al. [18]. Another way of investigating AEH authoring applications is by comparing them with each other. An example of this feature-by-feature comparison, as it is referred to, is described in [43]. Typical features to study in this kind of evaluation are, e. g., adaptivity in organising the course contents, personalised curriculum sequencing and adaptive link annotation. Cristea and Garzotto describe a taxonomy that contains key dimensions for the design of AEH applications that guide teachers and course developers through the design process [36]. They stress the importance of a sound design. Important issues are the design of the content domain (concept types and their relations), instructional view (the learner specific path through the application), detection mechanism and adaptationand user model. This model does not offer evaluation criteria as such, it nevertheless is a kind of evaluation tool in itself. The different parts of the model are extremely suited to analyse an AEH system, they fail, however, to be useful as a quality factor. An AEH system that complies closely to the model, can be regarded as a good system. On the contrary, if a system fails to match the majority of the design dimensions, it can, to a certain extent, be considered an inferior system. In other words, the design aspects mentioned are essential for any AEH authoring application. Kravcik and others carried out a quantitative and qualitative evaluation on AEH systems [53] and propose issues such as error tolerance, personalisation, simplicity and consistency. The issues they bring up are directly derived from ISO standards, so these standards are used to deliver the criteria for the evaluation framework. The evaluation issues useful for the evaluation framework are derived from ISO 9241, parts 10 and 12. As stated before there was of old little attention from the educational world for the development of AEH systems, however, this is changing of late. Min and De Diana [59] developed a questionnaire to assess AEH authoring systems. As they are part of the Minerva/Adapt project, the focus is on design aspects. The questionnaire forms a combination of analysis (dimensions, models, learning styles) and evaluation (editors for the different dimensions, completeness, user-friendliness). The questionnaire provides some valuable aspects that are to be included in the framework.

CHAPTER 4. A generic model for evaluating AEH environments

In this chapter, the evaluation of AEH authoring systems will be considered from several different perspectives. These different views lead to certain criteria for evaluation, they are grouped in technological, educational and end-user criteria. These groups of criteria match the concerns of the stakeholders that were introduced in chapter 1. All criteria are derived from the literature discussed in the two preceding chapters. Eventually, this chapter presents a generic framework to evaluate AEH authoring systems, this model contains all selected criteria. To make the concerns of the stakeholders concrete, criteria must be selected at a certain point. The reason being the fact that otherwise it would be impossible to evaluate the concerns of a certain stakeholder. To assess 'technology', for instance, is a task too abstract. These criteria have to be chosen on solid grounds. For this reason, quality factors are formulated implicitly. These quality factors describe the concerns of the different stakeholders. The three categories of aspects mentioned each contain the concerns of a different stakeholder. The educational (or conceptual) domain describes the concerns in the problem domain and the technological issues together depict the solution domain. For obvious reasons, the end-user domain serves the stakeholder end-user. The three stakeholder domains together cover all parties involved in the evaluation of AEH authoring systems. As stated before, this is the reason the framework is generic. The division in three domains is fixed, as opposed to the criteria they contain. This research describes a framework containing a list of criteria, based on certain selection criteria, which are mentioned at the end of this section. Other researches could choose another selection method. They could choose another or a longer list of evaluation criteria. However, the generic framework will stay the same, i. e., the division in three domains (technological, educational and end-user) is not under the question mark. Each stakeholder domain brings about certain quality factors, from which, ultimately, evaluation criteria are derived. The criteria which are used to evaluate each of the domains are both necessary and sufficient. Each of the criteria in a certain category has to be necessary for the concerns of the stakeholders to be satisfied. All the criteria together have to be sufficient in this respect. In other words; if a certain criterion would be left out of the framework, then the framework qould be incomplete (i. e., all the criteria must be necessary), and; if this list of criteria is complete, the system can be evaluated (i. e., the criteria together are sufficient). The two preceding chapters describe many sources from literature. All in all, an attempt has been made to present a complete view on the field of AEH, evaluation issues of AEH and the subjects closely related to it, such as software quality and existing frameworks. From the literature, thirteen sources are selected, all containing different and sometimes overlapping issues and criteria for evaluation. Most sources already describe metrics with which the criteria can eventually be assessed. The following section describes the selection method used to select the criteria that are to be in the final framework, and the corresponding metrics. For selecting technological criteria, a check was carried out as to whether issues (criteria) form a solution or answer to issues in the problem domain; the educational/conceptual domain. Educational issues and criteria are selected from existing evaluation methods and literature describing known issues in the field. Selected criteria are based on those issues that have proved their value in other evaluation methods and are, as such, labeled by other researches as established. Criteria in this domain evaluate everything that has to do with the adaptation process and is indispensable; from input acquisition to the drawing of conclusions, the models on the basis of which decisions are made, the extent of adaptivity of the system and aspects of reusability and shareability. Together, these criteria provide a complete view on the quality of the adaptation process. In order to select criteria for the end-user domain, each issue is scrutinised as to whether it describes an essential aspect for the teacher or course developer. The end-user should be able to create lessons easily, without concerns on, for instance, installation issues, or a non-workable system interface. Literature provides advanced methods to assign metrics to a certain criterion. Because of time constraints, this research assigns metrics to criteria in a more natural way, by following literature already studied within this research. Metrics are combined from different sources and if a criterion is taken from literature, this does not automatically mean its metrics are all taken along.

4.1 Criteria to evaluate AEH authoring applications

In order to present a framework for evaluating AEH authoring systems, this section presents several criteria on the basis of which aspects such as, amongst others, performance and effectiveness can be measured. The criteria are divided in three subsections, according to the group of aspects they influence; the subsections are technological, educational and end-user related. The group of technological issues involves aspects on languages (such as HTML and XML/RDF) and interoperability between systems. The subsection on educational criteria is a rather broad one, it includes everything from conceptual structures to standardised meta data languages. The group of tools-related or end-user issues covers everything related to the teacher or author and his capability to perform tasks in a certain AEH environment.All the criteria extracted from literature, issues, other frameworks or methods are presented in the following section, without separation on origin or citation. Adaptation techniques (see section 2.1.1), such as the hiding or dimming of fragments of text, are the effect of the adaptation process carried out by the system. They are referred to as the part of the system that is detectable by students, the so called 'front-office'. Adaptation techniques can be traced back by looking at aspects in the educational/conceptual domain, the adaptation decision in particular.

4.1.1 Technological criteria to evaluate AEH

This category of criteria includes aspects of adaptive educational hypermedia applications that are concerned with the technology of the systems. The technological aspects, together with the ones mentioned in the following two sections, form a part of the 'back-office' of AEH systems and are not to be confused with the adaptation techniques, described in section 2.1.1. The criteria in this domain assess the technological aspects that support the educational (conceptual) processes.

data-independence: Course materials written in, for instance, XML or another data-independent language, which separates contents from presentation, offers possibilities to share materials between many different systems, each one using its own presentation format.

interoperability between different systems If two or more systems are using similar data formats or sharing the same standards for communication, interoperability is a feature that comes available. Systems are able to share material more easily.

modular composability: Component based programming can make software both reusable and shareable; two very important educational evaluation criteria.

reliability: The extent to which a program can be expected to perform its intended function with required precision.

accuracy - The degree of precision of computations and control.
consistency - The extent of uniform design and documentation techniques used.

According to Aksit [1], for a software product to be adaptive it must be able to deal with uncertainties in its environment. To a certain extent the software can be programmed to recognise and deal with different kinds of situations. However, choices must be made regarding the situations to recognise and deal with. In this choice lies the fixation of the adaptability of the software. The same applies when introducing component based programming; the nature of the available components determines the adaptation behaviour of the system. When building an adaptive software product, it is impossible not to make certain issues fixed. In other words; a program can never be fully adaptable, by definition. In AEH, however, the adaptability is usually covered within the user model(s). The concepts for these models are created by the authors of the software and updated during the use of the application.

The data-independence criterion is based on Devedzic [43], Cristea [29] and McCall [54]. The latter describes the criterion as being independent of nonstandard languages, whereas this report follows the definition of Devedzic who puts the emphasis on the interoperability aspect of the data-independence, e. g., HTML is a standard mark-up language, but not applicable if separation of contents from presentation is desired. A system using data-independent languages is expected to be better in supporting reusable and shareable material, which are issues in the educational/conceptual area. In analogy with the first category, interoperability is defined on the basis of the same literature. McCall [54] stresses the importance of standard interfaces, whereas Devedzic lays emphasis on the use of different layers. This criterion is important when developing shareable materials. Both McCall [54] and Aksit [1] discuss the role of components in developing applications that desire a high degree of shareability and reusability. Modular composable systems are systems with one or more separate components for a specific system function. The reliability criterion is derived from the quality factor models of section 3.1, it is measured by the accuracy and the consistency. This criterion is named in both models, McCall's and Hewlett-Packard's. However, not all of the metrics mentioned are actually used. The complexity and modularity of the software are not applicable in this research, the error tolerance and simplicity are measured with the end-user criteria. Reliability is an issue when evaluating, e. g., the input acquisition.

4.1.2 Educational criteria to evaluate AEH

This section covers educational, or conceptual issues. These two concepts represent the same order of criteria, i. e., educational AH systems are being studied, therefore, all conceptual criteria are automatically of educational nature. Conceptual criteria are by definition technology-independent [36].

reusability of content material: In order to create a large network or database of (links to) available course materials, the contents should be designed to be used, continually, in different courses. Contents can be just plain HTML, audio, video, or even an applet.

atomicity - A crucial concept here, each piece of content should handle about only one or a few definitions.
well-definedness - The degree to which the contents of the information chunks is described.

possibility to share content material: Shareable course materials pave the way for a broad network of resources that can be used by AEH systems.

static vs. dynamic adaptation: An application using static adaptation determines the user specifications in the beginning of a session. Throughout the course, the user model will not be updated. On the contrary a dynamic adaptation application is constantly updating the user model, according to the actions the user takes.

input acquisition: The input acquisition needs to be evaluated on the following aspects:

reliability - The extent to which the system produces the same results, given the same circumstances.
accuracy - The quality of nearness to the truth or true value.
precision - Whether or not the input is well-defined; unambiguous, correct, etc.
latency - The time that elapses between a stimulus and the response to it.
validity - The quality of being logically valid.

inferences drawn: The 'meaning' and 'semantics' of the conclusions the system makes on the basis of the data input need to be evaluated on correctness and validity. These inferences drawn find their repercussion in the user properties, i. e., the system updates the user model according its findings.

adaptation decision: Given the user properties stored in the user model and the inferences drawn, the system has to decide on what kind of adaptation to perform. This decision needs to be evaluated on the following aspects:

necessity of adaptation - Whether or not an adaptation is indeed required in the current interaction context.
appropriateness of adaptation - The extent to which the adaptation decision contributes to the requirements posed by the current interaction context. Sometimes even no adaptation at all can be the most optimal option.
usability after adaptation - The system must be usable after the adaptation has been carried out.

user-, adaptationand domain model: All information the system has on users, the domain and the adaptation process is stored in models. Based on the inferences drawn, these models are updated. They need to be evaluated on the following aspects:

representation - The actual state of a model should be correctly and accurately reflected. This criterion is also described in the following section.
comprehensiveness - The model has to entirely represent the inferred/interpreted information on the entity being modeled.
redundancy - Does the model contain 'attributes' of the entity being modeled, which cannot be inferred from interaction?

The reusability criterion of AEH authoring applications is based on McCall [54], he defines a metric called generality to measure the width or potential application of program components. Devedzic [43] and Cristea [29] also refer to this issue. Reusability can not be mentioned without referring to shareability. This second educational evaluation criterion is defined by Devedzic [43] and Cristea [29] as necessary for achieving a broad network of course materials. Reusability and shareability are both recognised as the most valuable (design) aspects of AEH authoring applications (see chapter 2).

One of the ongoing discussions in the AEH research field is on static versus dynamic adaptation, or adaptable versus adaptive hypermedia. As described in chapter 2.1, the first mentioned is defined as hypermedia where the adaptive capabilities are predefined, or fixed, at the design of the system. The adaptive possibilities consist of the opportunity of the learner to set certain variables. On the other hand, we find dynamic, or true adaptive systems. They are characterised by the ability to adapt to each user personally, during runtime. Adaptation techniques (see section 2.1.1) are deployed to yield adaptiveness. Evaluating the input acquisition is very important, as this process forms the starting point for the adaptation process. All other steps are dependent on the correct information of the user (the student). The input acquisition is found to be important by Gupta and Grover [49] and Weibelzahl [74, 75]. They both define aspects in their frameworks for evaluating this criterion. Subsequently, they discuss the inferences drawn on the basis of the input and the adaptation decision, as Karagiannidis et. al. [18] do. The outcome of the adaptation decision finds is repercussion through the use of adaptation techniques. The appropriateness aspect assesses whether the adaptation method or technique used contributes to the requirements set by the outcome of the inferences drawn earlier. All three also stress the great importance of correct models. In any AEH system, the adaptation is based on information stored in the user-, domainand adaptation model. The user model contains information on the user's characteristics and knowledge levels, the adaptation model is an overlay of the domain model which contains only the parts of the course materials relevant for the student involved. A great part of the evaluation framework is based on the issues described in this paragraph. The reason is that for the framework to be really applicable on educational AH systems, the concepts, the relations between the concepts, the adaptation decisions, etcetera are of major importance. This orientation on design is the one thing Cristea [36, 37], as pioneer in the field and one of the leaders of the Minerva/ADAPT project, keeps stressing.

4.1.3 End-user criteria to evaluate AEH

This section deals with the criteria that have to do with the end-user of the AEH authoring application: the teacher or course developer. Issues that are considered vary from the way the user perceives the information presented to available user support features.

usability These usability heuristics form the 'dialogue principles' or dialogue techniques as defined in ISO 9241 part 10.

suitability for the task - The dialogue should be suitable for the users task and skill level.
self-descriptiveness - The dialogue should make it clear what the user should do next.
controllability - The user should be able to control the pace and sequence of the interaction.
conformity with user expectations - It should be consistent.
error tolerance - The dialogue should be forgiving.
suitability for individualisation - The dialogue should be able to be customised to suit the user.
suitability for learning - The dialogue should offer novice user support.
ease of installation - The effort the installation costs.

presentation Part 12 of ISO 9241 contains recommendations on how to present visual information on screens so that users can easily perform 'perceptual tasks' (such as searching for information on the screen). The recommendations are based on seven guiding principles:

clarity - Information should be conveyed quickly and accurately.
discriminability - Information should be able to be distinguished accurately.
conciseness - Provide only the information necessary to complete the task.
consistency - Present the same information in the same way throughout the application.
detectability - Direct the users attention to the information required.
legibility - Information should be easy to read.
comprehensibility - The meaning should be clearly understandable.

transparency of the models How does the user's perception of the maintained models match the actual state of the models? This criterion is also described in the preceding section.

completeness - Does the user have a full �perhaps abstracted� view of what is modeled and of the current contents of the model?
coherence - How well can the user understand the attributes of the model?
rationality - Does the user understand why the model is in its current state?

acceptance of adaptation How does the user react upon the adaptation?

timeliness of adaptation - Is the decided upon adaptation applied in a timely manner, e. g., not too late?
obtrusiveness of the adaptation - How obtrusive, or obstructive is the application of an adaptation, with respect to the user's main interaction tasks?
user control over the adaptation - Can the user disallow, retract, or even disregard an adaptation?

end-user support The system can support the teacher or course developer by automatically suggesting relevant information. Possible features are the adaptive ordering of course materials, auto-evaluated questions and knowledge-based navigation support for examples and solutions analysis.

editors Editors are necessary to communicate with the authoring system and give input on models, the adaptation process, etcetera.

dimensions - Which dimensions are recognised and used by the system?
completeness - To what extent are the editors covering all the aspects of a dimension?
user-friendliness - Are the editors easy to operate?

The usability and presentation criteria are referred to by Kravcik [53] and are fully described in ISO 9241 [45]. This ISO standard defines usability as "(the. . . ) extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use". Furthermore, the standard handles on the criteria to evaluate the interaction process between humans and computer systems. The usability criterion is further refined by the literature of Paramythis [66], Min [59], McCall [54] and Hewlett-Packard [48]. The presentation criterion is further refined by introducing Gupta & Grover [49] and Paramythis [66]. A usable system with proper presentation features is essential for a workable environment for the teacher or course developer. Metrics for the evaluation of the transparency of the models are defined by Paramythis [66], just as those for the acceptance of adaptation. The latter are further refined with the help of Gupta & Grover [49]. The end-user has to be able to get a correct view and understanding of the different models quickly. Otherwise, s/he is frustrated in the development of new course materials by lingering in these aspects for too long. The same applies to the acceptance criterion; in order for the end-user to accept adaptations, these have to be clearly defined.

The end-user support evaluation criterion is valuable in itself and is based on Brusilovsky [43] and also on McCall [54] and Hewlett-Packard [48]. The criterion for evaluating the editors is derived from the questionnaire of Min and De Diana[59]. Having multiple editors supports the teacher or course developer in the process of creating new course materials and adapting other ones.

4.2 A generic framework for evaluating AEH authoring systems

The criteria described in the last three sections are applied in the generic evaluation framework presented below. The framework preserves the division in categories as it is introduced in the preceding section. The framework consists of three tables, each one representing one category of criteria. The main purpose of the framework is to make the criteria operational, e. g., measurable. This leads to the overall purpose of the research described in this report; the evaluation of AEH authoring systems. The meaning and interpretation of the components with which the criteria are measured are discussed shortly after each table. The assessment method formulated for this purpose consists of a range of values varying from --, -, �, + to ++. For each criterion, the extent to which certain metrics are applicable is determined. If the metric is not applicable at all, -- is assigned, otherwise, ++ is assigned accordingly. Intermediate values are accredited if the extent to which a certain metric is valid demands so. As stated before, there are several other methods for assessing metrics, most of which are of a more scientific nature. However, due to time constraints, these methods are not taken into consideration. When applying the framework, one must take the point of view of the teacher or course developer (called user in this research). With certain criteria, such as the 'ease of installation', things might get confusing in this respect. The reason being the fact that it is arguable whether the user has anything to do with installation issues. This research choses to integrate the installation process in the evaluation framework. It is considered an integral part of the evaluation, as it is, in this stage of the development of the three test systems, not an option to 'just' use a certain system. The normal way of things would be to have an installed version of an AEH authoring application, where all the technical issues are hidden from the user by the system administrator of the school or learning environment involved. For this reason, the ease-of-installation criteria could perhaps be considered questionable. This will be discussed in chapter 5. As stated before, this research makes a clear distinction between the user (or end-user) and the student. In many other researches, the student is referred to as the user. This research considers the teacher, or course developer, as the (end-) user.

Table 4.1: Technological criteria to evaluate AEH authoring applications.

As shown in table 4.1, the technology aspects of AEH authoring applications are measured through four criteria: data-independence, interoperability, modular composability and reliability. If a system is using only data-independent languages, ++ is accredited. It is the same the other way around: a system without any data-independent language gets --.

Applications that use both data-independent and non data-independent, are accredited -, � or +, according to the relation between both. For the measurement on interoperability the same distribution applies. A system completely built out of components is rewarded ++, a system that is not, gets --. In most cases, either one of these situations apply. In the rare case a system is partly using component-technology, custom values are assigned accordingly. The accuracy and consistency metrics for the reliability criterion are scaled from -- to ++, according to the extent to which they are fulfilled.

Table 4.2: Educational criteria to evaluate AEH authoring applications.

The second and third category of criteria to evaluate AEH authoring systems, educational and end-user criteria respectively, are measured and valued the same way the technological criteria are. Metric 8.4, for instance, the latency of input acquisition, is valued ++ for a real time response, -- if it takes longer then ten seconds and an intermediary value for other response times.

Table 4.3: End-users criteria to evaluate AEH authoring applications.

CHAPTER 5. The application of the framework on MOT, AHA! and WHURLE

In order to develop the evaluation framework this chapter presents a test lesson that is applied on the three test systems MOT, AHA! and WHURLE. This evaluation not only validates the framework, but also presents findings on the three test systems. After carrying out these tests, a final, generic evaluation framework for AEH authoring applications remains.

5.1 The test lesson -- Transistor Example

The original three-dimensional lesson was created by Min [58, 55] and converted to HTML by students in 2004. The structure of the Transistor example is as follows:

Transistor [for all levels]
- Intro
- Image
- Symbol
- Circuit
- Innovation
Model
- Circuit [for all levels]
- Symbols [for Beginners only]
- Listings [for Experts only]
Simulation [for Advances and Experts only]
- Instruction
- Exercise
- Simulation

Figure 5.1 shows a screenshot of the original transistor product, as developed by Min and others [55]. Also shown is the 'simulation' dimension. The homepage contains some explanatory text about the site and the lesson. The first chapter, Transistor, provides some general information on transistors in four paragraphs. The last paragraph has a link to some historical information. The second chapter, Model, teaches students about the the working of transistors through models and formulas. The third chapter, Simulation, contains a Java applet with the simulation of a transistor, preceded by some instructions and exercises.

Figure 5.1: Screenshot of the original transistor lesson

Although this test only contains three dimensions; text, visuals and intelligence (a simulation); it is representative of all six dimensions. The reason being the fact that if an applet or a simulation can be added, an animation, a sound fragment or video can also be added. This certainly aplies to MOT, as it provides one (HTML-) editor for all six dimensions. The original test lesson described in this section was intentionally created the way it is to be able to evaluate any AEH authoring system without much effort. The lesson contains only a few chunks on purpose, though it presents information in various dimensions (text, graphical and intelligent simulation). The lesson also provides a six-question test to determine a user's knowledge level. This rather simple, though complete, lesson should invite developers or evaluators to assess a system more easily and at the same time provide valuable feedback.

5.2 Before the test

The results of the test lesson in MOT, AHA! and WHURLE as described below are realised by applying the test lesson one time only on the systems mentioned. The test, being a single evaluation only, does not give a definite list of criteria. Some criteria could be found questionable, as they can not be judged well or do not differ much between systems. Other criteria can be found definite, for the very same reason. Later researches might just prove different; criteria which now seem questionable become definite and vice versa. This arises from the single character of the test. Still, the framework should be alright, as the foundation is solid. As stated before, the division in the three categories stands, as it describes exactly the three domains of stakeholders involved. Other researches can propose other criteria, though, to evaluate issues in a certain category. The grades are based on the operationalisation of the criteria found in existing literature, which is of very diverse nature. A lot of the criteria used are combinations of various descriptions. The grades can, of course, be subjective to a certain extent. Nonetheless, another researcher should, in theory, come to the same results when applying this framework. Because the criteria are solidly operationalised, all grades and thus conclusions can be traced back by anyone who wants to do the same research. However, it must be stressed that the test is performed only once on each system, by one researcher, using one lesson.

5.3 Findings from the application of the test lesson

This section discusses the results of the application of the three dimensional transistor circuit test course in MOT, AHA! and WHURLE.

5.3.1 MOT

The screenshot in figure 5.2 shows the transistor circuit test course created in MOT. Section 2.2.1 explains in detail the basics of MOT. In order to separate conceptual information (semantics) from the real information carriers, MOT depends on the LAOS model. This separation is necessary for presenting different contents to different students.

Figure 5.2: Screenshot of the transistor lesson in MOT

MOT is both data-independent and interoperable [34], as it uses a MySQL database for storing concept information, CGI scripts for communicating with the database and Perl to process most of the information. These three standards are widely known for being open and independent. The adaptive and non-adaptive parts are separated in MOT. This decomposition is very useful, not only for analysing, but also for the reusability and shareability. It could not be determined if the system is completely built out of components. MOT is found to be not quite accurate, as it sometimes is a bit unclear about what follows on a certain action. The consistency is assigned +, because it is clearer than accuracy, but still shows some inconsistencies. For instance, adaptive strategies can be added in different ways, there is no conformity between the processes. Every concept stored can be made atomic by the teacher or course developer. MOT supports this by an easy-to-use graphic interface for creating concepts (with optional descriptive keywords: � for well-definedness) and letting teachers import concepts created by others. The latter says something on the shareability of the MOT course materials. MOT supports both the creation of adaptable and adaptive applications and the criterion on static vs. dynamic is valued a ++. The quality of the input acquisition is dependent on the correctness with which the teacher or course developer creates the course materials and the adaptive relationships between them. MOT provides a well-structured and highly supportive interface, extremely well suited for building solid lessons. Summarising, the reliability and the latency are not influenced by a third party and, therefore, score ++. The other metrics are more or less dependent on the user, though thoroughly supported by MOT, they behave exactly as instructed; they get a +.

MOT has no difficulties making the correct decisions (++) on the basis of the input. The adaptation decision is dependent on several factors. The necessity is as good or correct as the teacher makes it to be. The extent to which the adaptation is appropriate is high. Because MOT offers a lot of adaptation techniques (see page 2.6), the user is able to select the optimal adaptation (e. g., link annotation or fragment hiding). The usability after adaptation is also high, as no parts of the interface change structurally. The version of MOT by which the testing is conducted did not contain a representation of all the models. From the models that are present, it is not always clear which state they represent. All the attributes of the models are directly towards choise by the user. MOT does not score that high on usability, because it sometimes is too technical for the average teacher or course developer. Moreover, MOT lacks a proper manual or wizard c .q. assistant and the different components to built a lesson are not self-descriptive. The system is not very tolerant of failures, such as the accidental removal of course materials. MOT, however, is suitable (though rather difficult) for and controllable by the user, if the user is sufficiently skilled. MOT is not suitable for individualisation; each user is presented with the same interface for developing lessons. The installation of MOT can be very difficult, especially for the average user who is not accustomed to installing software such as web services, MySQL databases and the coupling of the different parts. The clarity aspect of the presentation criterion is accredited --, because the system sometimes fails to present the information stored in a concept correctly. This is, for instance, the case when HTML is entered in a text field. The difference between a conceptmap and a lesson is, especially for a novice user, rather vague. One other detail in this context is the fact that MOT fails to present a good view of how a lesson will develop: MOT does not have a presentation function. MOT scores a + on conciseness, as the interface is not polluted with irrelevant information. As it is not always directly clear what the next relevant step is, the value for detectability is �. MOT also gets a � on consistency of the presentation, as the same adaptation technique sometimes can be added in different ways. One has to be accustomed to MOT, in order to understand all the information presented through the interface. This does make MOT a bit illegible and incomprehensible, the latter to a lesser extent, though. The user is unable to have a complete view of the contents and the meaning of the models being represented. The contents of the domain model are quite easy to read, as they are shown in concept maps. The adaptation- and the user model, however, are not made visible through the interface in a single view. The user has to make interpretations to judge the correctness. The same applies to the coherence metric; attributes are easily understood, though only in the context of the domain model. All the changes the user makes are directly processed by the system, so the timeliness metric is valued ++. An interaction between the user and the system is obtrusive, as an operation that has been started first has to be completed. However, the user always stays in full control over the operation and can undo it, if so desired. All in all, � is assigned. On end-user support, MOT scores a +, because it presents each user similar concepts when a concept is created. This is done on the basis of (key-) word matching. Another form of support could be the automatic generation of pre- and post tests. The criterion on MOT editors is a bit complicated. On the one hand, all six [56] dimensions can be used in a lesson. On the other hand, though, MOT does not offer specific editors for, e. g., graphics. As MOT falls short here, it gets - for completeness, + for dimensions and � for user-friendliness, because only one editor is provided.

5.3.2 AHA!

As AHA! has been around for some years now, it has developed into a highly extensive and well documented product. As there are different editors and many ways to get things done, it is not easy to just start building lessons in AHA!. One first has to understand the underlying ideas of the system and create some initial XHTML files, such as an index and a registration/login page. Below is shown a screenshot of the transistor test lesson created in AHA!. What can be seen on the left side is the menu structure where the red balls indicate that the user, being advanced in this example, is not supposed to access this information. The link is still functioning, in spite of it being unsuitable. Contents available to the user are marked with a green ball and information already read is marked gray.

Figure 5.3: Screenshot of the transistor lesson in AHA!

As with MOT, AHA! is an open source system which incorporates dataindependent technologies. For storing information, one can choose between a MySQL database or XML files. Furthermore, AHA! uses Java servlets to present information to students. These languages, together with the use of open standards, like TomCat for enabling the servlets to function through the internet, make AHA! also interoperable. AHA! offers several editors for creating lessons. The Concept Editor is a low-level tool which enforces the user to manually create adaptation rules between concepts, which are the information stores. On the other hand there is the Graph Author, which lets the user draw lines between concepts to mark relationships (e. g., is prerequisite of) between concepts. This freedom of control makes AHA! very accurate, at least when using the Concept Editor. Due to the fact that the output of the Graph Author can be read by the Concept Editor, the application is found to be quite consistent. However, one has to pay attention to small differences coming forth from these translations.

The Graph Author tends to simplify matters, so information can be incomplete when translated to lower-level tools. Contrary to MOT and to the way relationships, and thus adaptation rules, are added in AHA!, the system fails to present an editor for the actual course materials. The user has to write (X)HTML pages that are eventually linked through the use of, for instance, the Concept Editor. The atomicity is fully dependent on the input of the user and so is the reusability of the concepts; the value assigned here is � for both atomicity and well-definedness. The same applies to shareability, as other users can not automatically deduce the contents of an information-containing (X)HTML file. This is contrary to MOT, where keywords can be added to each piece of information. AHA! assists in the creation of both adaptable (static) and adaptive (dynamic) lessons, so here a ++ is assigned. With regard to the input acquisition, again a separation should be made between the different kind of editors. The low-level tool assures a highly reliable, accurate, precise and valid input. The high-level Graph Author can not always be trusted, as a lot depends on the user's correct input. Of course, this applies to the Concept Editor as well, but with that tool the user can actually see what is happening. The Graph Author hides the adaptation rules it deduced from the graphical input of the user. The latency is good in any case, as AHA! responds immediately to changes made by the user. The necessity metric with which the adaptation decision is evaluated depends on the user's input, a + is assigned. The user is offered the use of every adaptation technique s/he can think of, as there is no limitation on the formulation of adaptation rules. The appropriateness of adaptation is accredited ++. The adaptation decision does not affect the usability of the system. The different models such as the domain model (containing the concepts and their relations), can be viewed in the Graph Author. With this tool, the user is presented the complete contents of the model. However, the Graph Author does not show the attributes belonging to a concept. For this task, the Concept Editor has to be used. The values for representation, comprehensiveness and redundancy are +, + and �, respectively. The technical background of the developers of AHA! is clearly visible in the way AHA! works. This makes it difficult to create lessons, even for people with experience in IT. The self-descriptiveness of the authoring process is also lacking. These two metrics are accredited � and -. The controllability is valued on +, as the user is able to control the pace and sequence of the adaptation. The insertion of adaptation rules, concepts and pages is done the same way each time, so AHA! scores a + on the conformity with the user expectations. Possible errors made by the user can be undone without much effort. To a certain extent AHA! is configurable by the user, in other words, one can choose to use, for instance, the Concept Editor on all occasions. The same index page in HTML can be used again with every new lesson. The suitability for learning is valued on +, as AHA! offers a lot of documentation. The application is not easy to install, because of all the web services, databases and such like, which have to be provided. AHA! does well on both the presentation of the authoring process(es) and on showing the final lessons to the user. As with MOT, the difference between a page and a concept is not very obvious for unexperienced users. AHA! is not concise, because of, for instance, all the possibilities the Concept Editor offers. This increases the difficulty in recognising the next step. The consistency could also be better, as the Graph Author and the Concept Editor are two sides of the same medal, though they have entirely different interfaces. With some effort and assistance, the information presented through AHA! is readable, so the legibility is accredited �. The comprehensibility is valued a -, because often the used terms are not at all clear without the manual or help file. The models in AHA! are, as such, quite transparent to the user. They are also complete (viewable in the Graph Author), coherent (the user defines the attributes himself) and rational (the current and next state are determined by the user). The timeliness of the adaptation is high (++); changes are directly processed. The interactions between the user and the system are not very coercive, the process can be aborted at any time. The Graph Author automatically supports the teacher or course developer; AHA! is valued a + on end-user support accordingly. As with MOT, AHA! does not offer editors specifically for all six dimensions. Instead, pieces of course material in the form of, for instance, video have to included through a back door, by including them in an (X)HTML page. The available editors are rather easy to operate, that is, if a manual is nearby. Editing pages is not supported by AHA! and has to be done through other applications.

5.3.3 WHURLE

Figure 5.4 presents the transistor circuit course that was created in MOT and displayed in WHURLE. The reason for not creating a lesson in WHURLE itself is twofold. The main reason is that WHURLE takes XML files as input for its lessons. There is a (text-) editor available, but it does not function as intended. In fact, WHURLE is merely a 'presentation application' on the moment. On the other hand, WHURLE seemed to be the perfect system for presenting the lesson built in MOT as, at the time of research, there just was a MOT-to-WHURLE translator available. The evaluation could still take place, as the criteria are measured through a combination of the MOT test lesson presented in WHURLE and the available literature on WHURLE.

Figure 5.4: Screenshot of the transistor lesson in WHURLE

WHURLE is to a large degree both data-independent and interoperable, as it uses only open standards to perform its functions. A MySQL database is used to store user profiles; chunks of information and lesson plans are created as XML files. The output is delivered as dynamic HTML and should, therefore, be suitable for presentation in almost every browser. XSLT stylesheets are subsequently used to process this information. WHURLE can be considered component based, as it offers three different modules for the adaptation process, the display engine and the skin layer. Even in the somewhat straightforward test presented here, WHURLE proved itself to be unreliable. It is unpredictable when information is returned or processed correctly. Therefore, the metrics accuracy and consistency are assigned -- and -, respectively. The reusability metric atomicity is accredited ++, as WHURLE is known for its support for atomic chunks. The degree in which chunks are well-defined depends, of course, largely on the input of the teacher, or, as WHURLE calls them, subject experts. The term teacher in WHURLE refers to the person who creates the lesson plans, which contain the chunks of course materials. In theory, this should all lead to a more than moderate level of shareability as well. WHURLE allows users to create highly adaptive applications, so the next metric is assigned ++. The quality of the input acquisition again depends on the teacher or course developer. The system offers a timely response to operations, however, the accuracy, precision and validness rely on the user's input. The reliability scores, again, a -, as the version of WHURLE evaluated simply can not guarantee a correct response. On the basis of the correct input, the system draws the correct, and thus valid, conclusions. Adaptations are as necessary as the user makes them. Their appropriateness in WHURLE is somewhat less; as the system does not offer much adaptation techniques, choices are minimal. The usability after adaptation remains high. The representation of the actual state of a model in WHURLE is not always clear, as each teacher can use his own module to create or update the models. For this reason, the comprehensiveness is also accredited �. The redundancy of the models is low (-), for their contents are untraceable for the user. The usability of WHURLE seems quite low from the evaluation. The suitability metric is assigned -, as the interface is too technical for the average user. The interface is not self-descriptive, the user has to puzzle it out himself, without the support of a proper manual. The level of control can be quite high, that is, if the user has enough knowledge of the functions offered by WHURLE. As stated before, the system makes mistakes, which are unpredictable in time and nature. Installing WHURLE and all the required web services is quite difficult, so this metric scores low as well. Because of the possibility to use own or customised modules for certain functions, the system is accredited + on suitability for individualisation. Because of the aforementioned possibility of unexpected errors, the clarity of presentations is low. The consistency, conciseness, detectability and comprehensibility are accredited -. The consistency is not good, as a real editor is not presented and it is more or less up to the user to generate correct chunks. For the same reason, the conciseness is rather low. The detectability and comprehensibility are low, because the meaning of the information presented through the interface is not always clear. The interface is legible however, therefore, this metric is assigned �. The discriminability between the different items displayed is sufficient, so a + is assigned. WHURLE contains a content model, existing of conceptually discrete units called chunks, and a user model, built upon an overlay model and a stereotype model. The overlay model measures the user's knowledge within a given domain, whereas the stereotype model classifies the user according to prior knowledge and ability [12]. The adaptation filters ensure that students are classified into three categories: novice, intermediate and advance. The teacher has, in theory, an overview of the contents of all the models. He might not understand the different attributes connected to a certain model, especially given the fact that models can be imported from other sources. The same applies to the understanding of the different models. The timeliness of an adaptation is good, as is the obtrusiveness. The user control over the adaptation, however, is rather low, due to the fact that WHURLE selects a category for a student that more or less remains the same throughout the lesson. The end-user support offered by the system is moderately good, or bad. On the one hand, WHURLE does not provide proper manuals, wizards or help functions. On the other hand, users are offered ready-made pre- and post tests. Users can create XML files containing their own choice of dimension, the additional editors, which are not user-friendly, support this process to a certain extent only.

5.4 Summary of the findings

After applying the test lesson on MOT, AHA! and WHURLE, many remarks can be made on the framework, on the findings and on the systems. This section describes the main issues.

Assessed is the extent to which a certain criterion or metric applies. Grades are accredited ranging from -- to ++, according to their applicability.

5.4.1 Discussion

Table 5.1 shows the first two criteria, data-independency and interoperability, are valued ++ on all three systems. This is hardly strange, as the systems are developed within the scope of the Minerva/ADAPT project. One of the starting points of this project was the advice to use open standards and data- independent languages. The criteria could, in this context, become questionable on the basis of this non-separation capability. However, it remains important to evaluate on these criteria and, therefore, they are not considered questionable. The use of components in a certain system is useful as an indicator for open and interoperable systems. The fourth criterion, general reliability of the system, seems a definite one. Next to the fact that it describes important issues, it brings discernment, as can be seen in table 5.1. Although the criteria on reusabilty and shareability are mentioned in the project specification of ADAPT (a. o. appendix A and [35]), they are not all fulfilled by MOT, AHA! and WHURLE. These two criteria are not only very useful on their own, they have also proved their value in revealing deficits. The static versus dynamic criterion is perhaps questionable, as all adaptive educational hypermedia applications should, per definition, offer adaptive capabilities. The systems under evaluation are all assigned ++ here, but this does not necessarily mean this criterion has become superfluous. The reason being the fact that it undoubtedly remains important that, although obvious, every AEH system contains true adaptive features. The quality of the input acquisition is a very important issue to evaluate, stressed by all other researches in the field of AEH. Its first three metrics, reliability, accuracy and precision, look very distinctive. The latency metric is perhaps questionable, as accredited high in every system. This is not amazing, as modern computers have proved to be fast; speed is not an issue anymore. Nonetheless, latency can be a useful metric, taking into account the fact that one of the goals of, e. g., the Minerva/ADAPT project is to create a broad network or database of cooperating AEH systems, i. e., systems should not be held up by slow connections. The validity of the input seems beyond doubt, just as the quality of the inferences drawn. This is not that unusual, as making valid and logically correct assumptions is a feature well known to computer systems. The adaptation decision gives information on the techniques used by a system. It more or less has the function of checking the presence of these features. Evaluating the models and their representation is beyond any doubt a very important issue, as most of the information concerning the course materials, the user (the student) and the adaptation process is contained in these models. Table 5.1 shows the discernment this criterion causes. Were the educational criteria valuable for evaluation on conceptual issues, the end-user criteria is useful for the way the teacher or course developer is able to handle the system. The first criterion examines usability aspects, such as controllability and self-descriptiveness. As can be seen in table 5.1, all the metrics are quite distinguishing. Therefore, they definitely belong in the evaluation framework. The problems with the installation of MOT, AHA! and WHURLE are very noticeable. However, as stated, the installation issues could be excluded from the system if the focus is completely shifted towards the user's point of view and it is taken for fact that the systems are installed and working. The installation process and the accompanying documentation certainly is an issue to be taken into consideration by the developers of the systems. As a matter of fact, all the criteria in this category seem suitable for evaluating AEH authoring systems. As seen before, the timeliness metric is perhaps questionable. A very important issue to look after, is the support of different dimensions for presenting information, evaluated through the last criterion. The evaluation showed that not all the systems pay enough attention to this aspect. The application of the generic framework in combination with the test lesson on MOT, AHA! and WHURLE revealed some insights in the questionable nature of certain criteria. The static versus dynamic criterion can be put under the question mark, as every AEH system provides, by definition, adaptive features. Perhaps also questionable are the metrics latency and validity of the criterion on input acquisition. One could argue about latency not being an issue anymore in ICT, as modern computers do not have the speed problems computers used to have in the past. Making valid assumptions is a feature for which computer systems are well known. For this reason, the decision could be made to exclude this metric from the framework. The same applies to the criterion inferences drawn. The ease of installation is perhaps questionable, as it not a fixed issue in the range of user concerns. The last questionable metric in the framework belongs to the acceptance of adaptation criterion, it is the timeliness of adaptation. This metric can be considered questionable for the same reasons mentioned at the latency metric before.

5.4.2 Remarks on the testing

All statements made on the criteria in this chapter are, to a certain extent, subjective. However, it has to be taken into account that this is only one research, in which one researcher conducts one test using one lesson. Other researchers should come to the same conclusions, conducting a test with the evaluation framework presented here. Statements on criteria and metrics being definite or questionable are preliminary, again because of the single character of the study. The criteria which, in particular, now seem questionable, call for further attention. The main goal of the research described in this report is to develop a framework for evaluating AEH authoring applications. This framework is applied in this chapter. Decisions had to be made on which categories of criteria to include in the framework and how to measure them. All in all, this generic framework covers the whole field of aspects on AEH authoring systems, divided in three categories. Obviously, one could argue about the presence or the absence of certain criteria. After further research, as, for instance, conducting more tests on different systems, the framework is able to evolve. However, chapter 4 made clear that the criteria in this framework are both necessary and sufficient for this framework to be generally applicable on AEH authoring systems. One could argue why an evaluation on, for instance, the use and/or support of different learning styles is not included in the framework. The reason for this is the fact that the framework is supposed to assess AEH authoring applications on design aspects. Pure educational issues such as the one concerning learning styles would corrupt the framework to a certain extent.

CHAPTER 6. Conclusions and recommendations

The framework resulting from both the literature study and the testing on real AEH authoring applications gives rise to the statements made in this chapter. The first section describes conclusions based on both the testing of the different systems and the development of the evaluation framework.

6.1 Conclusions

This section describes conclusions on both the framework in relation to the original research goal and the corresponding research questions as well as the actual application of the framework on MOT, AHA! and WHURLE.

6.1.1 General conclusions

The original goal of the research was to design an evaluation framework in order to assess AEH authoring applications. This evaluation model has to contribute to the problem formulated in chapter 1: "the lack of evaluation frameworks for Adaptive Educational Hypermedia authoring applications". The main research question deals with the criteria by which AEH authoring applications can be evaluated. In order to be able to answer this question, several research subquestions were formulated. Before the final evaluation framework is presented, this section briefly discusses each research question. The second chapter describes the current state of AEH technology, as posed in research question one. Research questions two to four ask for the three categories of criteria to evaluate AEH systems. The issues and quality factors that lead to the criteria on which the framework is built are described in chapters two (section 2.3) and three.

The evaluation framework, which contains the criteria, is described in chapter four. Research question six asks for an update of the first version of the evaluation model. This process has not been explicitly reviewed in this report, the framework in chapter four is also the final framework. The recommendations that have to answer research question seven are described partly in chapter five and are repeated and elaborated on in this chapter.

One of the outcomes of chapter 2 is the taxonomy of adaptive hypermedia adaptation techniques described by Brusilovsky [15]. This model is used to analyse MOT, AHA! and WHURLE on their adaptive features (see section 2.2.4). The result of this analysis acts as input for some of the criteria in the generic framework. For instance, the appropriateness of adaptation can be assessed by looking at the number of available adaptation techniques for a certain adaptation decision.
Deploying Brusilovsky's taxonomy also revealed that inserting, removing and sorting of fragments and adaptive link hiding are the basic kinds of adaptation techniques. Almost every AEH authoring system incorporates them and one could argue that no such system could do without them. Table 2.2 shows only MOT offers adaptive multimedia presentation and adaptation of modality: both techniques used for presenting course materials through more than one dimension. AHA! and WHURLE are up for improvement on this point. In a broader sense, all systems should perhaps reconsider the set of techniques they currently adopt. There certainly is a gain in using more different methods and techniques, as this allows teachers to create more diverse lessons. The third chapter presents quality factors from different perspectives to eventually select a set of them for building a generic evaluation framework. Both Gupta & Grover [49] and Weibelzahl [75] proved a layered approach of evaluation works. Karagiannidis et. al. [18] were the first to introduce a layered approach, which separates the essential parts of the adaptation process from each other, i. e., the input acquisition from the making of the adaptation decision. Cristea and Garzotto [36] described the design aspects they consider and proved to be essential for any AEH authoring system: the design of the content domain, the instructional view, the detection mechanism and the userand adaptation model.

6.1.2 Results of the evaluation

MOT scores high on both technological and educational criteria and a bit less on end-user criteria. This shows MOT is a technology driven system, that has to be translated into a more user-friendly application. The idea of the developers of MOT was, in fact, just this, as they did not add a presentation module for teaching purposes (an interface for students) on purpose. The presentation for teachers, however, should be made clearer, as teachers and course developers are, generally spoken, not technological experts. A novice user can not, for instance, be expected to understand the difference between concepts and lessons. At first glance, AHA! seems the more all round system. It receives high values in all three categories, but, of course, still contains some flaws. This system is also a technology driven system, though better evolved in the direction of the end-user. In fact, AHA! is the only system to receive a positive total assessment in the category of end-user criteria. WHURLE more or less comes out of the test as the inferior system of the three. The reason for this could partly lie in the fact that the system was not used as MOT and AHA! were, for creating lessons. The WHURLE test lesson was created in MOT and then transferred to the WHURLE system. The reason for this is the complicated way of creating lessons in WHURLE in combination with the fact that the system is not (yet) designed to create lessons. In fact, WHURLE does not provide an editor or interface for creating course materials and adaptive structures. WHURLE unarguably contains certain flaws, such as the unreliability of the system in general and the improper features of certain presentation aspects. However, it has to be stressed that WHURLE does have a solid conceptual structure, which is, in theory, well supported by the technological aspects.

All three systems score well on educational criteria, this is explained by the fact that they were all constructed from some great ideas of concepts and adaptive linking mechanisms. This process is translated into a solid technical foundation, as can be derived from the relative high values in the category of technological criteria. The main issue remains the 'failure' of MOT, AHA! and WHURLE to present a proper interface: easy, simple and, for instance, accompanied by a decent manual. Shown in table 5.1 is that MOT and AHA! are approximately equally 'good'. AHA! seems somewhat more developed, which is not striking, as this system has been around longer then MOT. As all three systems originate from the Minerva/ADAPT project, they are highly data-independent, highly interoperable and well suited for reusing and sharing concepts, adaptive patterns, and such like.

In order for AEH authoring applications to be widely accepted by the educational world, usability is an issue of major importance. MOT, AHA! and WHURLE all are technology driven systems, developed by IT researchers and, as such, are not the most user-friendly from the perspective of teachers or course developers. Although they offer various tools and editors for creating course materials, they fail to present a system that really invites the user to perform this task. A point for improvement is the interface, which sometimes is too difficult or technical for someone who is not an IT specialist. The lack of a proper manual is another often observed weak spot. Although it is not part of the evaluation framework, there is an observation made on a pure educational issue. Another result from the fact that AEH systems are developed by researchers with mostly IT experience, and little or no educational background, is the low level of educational value of these systems. With a few exceptions current AEH systems are paying too little attention to issues as, for instance, cognitive styles. Often noticed is the effort to implement and adopt certain learning styles into an application. As such, this is an excellent development. However, this process cannot be fulfilled without the involvement of real educational knowledge, preferably from specialists in this field, who understand the issues concerned. The existing systems give the impression of being slightly artificial, as if the developers want to put in 'a or any learning style'.

Table 6.1 presents the final generic framework. This evaluation model is the ultimate result of the research described in this report. Questionable criteria are emphasised, to direct attention towards them.

The framework presented here is more complete than evaluation methods developed within other researches, because it includes the concerns of all the stakeholders. The evaluation framework developed by Gupta & Grover [49], for instance, is designed for evaluating AH systems, instead of educational AH systems. Furthermore, their framework treats the evaluation of an AH system as an integral part of the development process, whereas the framework presented in this research aims at fulfilling the concerns of the three stakeholders. From this emerges the most important reason; the fact that the framework of Gupta & Grover does not evaluate the concerns of the end-user. This lack of attention to end-user aspects is also noticed in the framework Weibelzahl [75] proposes. In fact, he refers to this in the recommendation on his own work. Another aspect which is omitted in the framework of Weibelzahl is an evaluation of the technological background of a system. Weibelzahl also has a less thorough investigation on the different aspects of the adaptation process. More than once, this report cites Cristea and Garzotto [36], who state the main issues of AEH authoring lie in the design. In order to develop a 'good' system, the proper design is the most important factor. The main design dimensions they formulate (a. o. design of the content domain, of the detection mechanism and of the learner model) are all adopted and elaborated on in the framework developed by the research described in this report. The three groups of criteria Devedzic [43] mentions (conceptual, technological and tools-related) are a close match to the three domains of issues as proposed in this research. The improvement lies in the fact that this research uses the issues described by Devedzic as a starting point.

6.2 Recommendations

The research and the resulting generic framework described in this report are, of course, not the end of research in evaluating AEH authoring applications. This section discusses several aspects which can act as a foothold for other researchers.

6.2.1 Further research

In order for the framework to evolve, further research is needed, especially on criteria labeled questionable. Other researches might use other selection methods to assign criteria to each of the technological, educational and end-user domains. The value of the framework can be improved when more systems are evaluated, more test lessons are formulated and assessed, more test persons are assigned, or a combination of these. A drawback of this research is the fact that all the evaluated systems originate from the Minerva/ADAPT project, and are, as such, equal on many aspects. This is another reason for conducting research on other AEH authoring systems.

6.2.2 Implications for MOT, AHA! and WHURLE

The results from the testings show that MOT, AHA! and WHURLE are technology driven systems in which the technological background of the developers is clearly noticeable. To a certain extent, all three applications are up for improvement in the area of user-friendliness. Some might consider this a minor issue, but in order to stimulate teachers or course developers to use AEH systems, they have to be workable systems. In order to allow teachers to create more diverse lessons, for instance, by adopting more cognitive styles, MOT, AHA! and WHURLE should support more adaptive techniques and multiple dimensions to present course materials.

6.2.3 Adaptivity and eCommerce

A parallel can be drawn between adaptivity in eLearning and the process of personalisation of websites in eCommerce. Just as students are supposed to achieve better when supported by an intelligent system, companies expect customers to spend more money when presented with a customised browsing environment. Amazon was one of the pioneers in this field (Footnote here: www.amazon.com).

They started to present different lists of books to each user, which were based on previous visits, books already bought at former visits and books bought by other users with the same interests. Nowadays, personalisation is one of the major challenges the eCommerce society faces. In fact, all the (key) issues that deal with eCommerce can, to a certain extent, be traced back to personalisation, and thus adaptivity. The outcome of the research described in this report is valuable to the e-Commerce area, as parts of the framework, such as the evaluation of the user-, domain- and adaptivity model and the input acquisition, can be used for evaluating eCommerce applications.

6.3 Related Work

As everyone agrees that there is a lack of evaluation frameworks for AEH applications, several researches describe a method to at least be able to comment on a given system. Although common opinion states evaluations of the usefulness and effectiveness of adaptation of the systems and between the systems should be one of the major directions in current AEH research, studies on other evaluation aspects are carried out as well. The research of Muntean and McManis [63] describes an evaluation study on AHA! where the outcome suggested an extra layer on every AEH application. This extra Quality of Experience (QoE) layer has to provide satisfactory end-user QoE by taking into account the environment the user operates in. For instance, if a user is on a low bandwidth connection, the system provides less detailed pictures.

References

[1] M. Aksit and B. Tekinerdogan. Design of software architectures, 2004. course 211133, University of Twente.

[2] C. Alexander, S. Ishikawa, and M. Silverstein. A Pattern Language. Oxford University Press, New York, 1977.

[3] AUSWEB97 - The third Australian conference on the world wide web. Adaptive Textbooks on the WWW, 186�192, Queensland, Australia, 1997.

[4] Y. Baek, C. Wang, and S. Lee. Adaptive hypermedia educational system based on XML technologies. In ED-Media '02 world conference on educational multimedia, hypermedia & telecommunications. Association for the advancement of computing in education (AACE), 2002.

[5] E-Learningplaza V2: Barometer. Online available on: http://www.e-learningplaza.nl/elp2/ includepagina.asp?url=/ELP2/barometer/barometer.asp.

[6] P. de Bra, A. Aerts, B. Berden, B. de Lange, B. Rousseau ans T. Santic, D. Smits, and N. Stash. AHA! the Adaptive Hypermedia Architecture. In Proceedings of the 14th ACM Conference on Hypertext and Hypermedia, pages 81�84, 2003.

[7] P. de Bra, P. Brusilovsky, and G.J. Houben. Adaptive hypermedia: From systems to framework. ACM Computing Surveys, 31(4es)(12.37), December 1999.

[8] P. de Bra et al. Adaptive hypertext & hypermedia home page. Online available on: http://wwwis.win.tue.nl/ah/.

[9] P. de Bra et al. AHA!, Adaptive Hypermedia for All! http://aha.win.tue.nl.

[10] P. de Bra, T. Santic, and P. Brusilovsky. AHA! meets Interbook, and more. . . . paper, 2002.

[11] P. de Bra, N. Stash, and D. Smits. Creating adaptive applications with AHA!; tutorial for AHA! version 3.0. In L. Aroyo and C. Tasso, editors, AH 2004: Tutorials, 2004. CS-report 04-20.

[12] T.J. Brailsford, H.L. Ashman, C.D. Stewart, M.R. Zakaria, and A. Moore. User control of adaptation in an automated web-based learning environment. In First International Conference on Information Technology & Applications (ICITA 2002). Bathurst, Australia, 25�28 November 2002.

[13] T.J. Brailsford, A. Moore, C.D. Stewart, M.R. Zakaria, B.S. Choo, and P.M.C. Davies. Towards a framework for effective web-based distributed learning. poster, 1�5 May 2001. 10th International World Wide Web Conference, Hong Kong.

[14] P. Brusilovsky. Methods and techniques of adaptive hypermedia. User Modeling and User-Adapted Interaction, 6(2�3):87�129, 1996.

[15] P. Brusilovsky. Adaptive hypermedia. User Modeling and User-Adapted Interaction, 11(1�2):87�110, 2001.

[16] P. Brusilovsky. Developing adaptive educational hypermedia systems: From design models to authoring tools. In T. Murray, S. Blessing, and S. Ainsworth, editors, Authoring Tools for Advanced Technology Learning Environment. Dordrecht: Kluwer Academic Publishers, 2003.

[17] P. Brusilovsky, J. Eklund, and E.W. Schwarz. Web-based education for all: A tool for development adaptive courseware. Computer Networks, 30(1):291�300, 1998.

[18] P. Brusilovsky, C. Karagiannidis, and D. Sampson. The benefits of layered evaluation of adaptive applications and services. In S. Weibelzahl, D.N. Chin, and G. Weber, editors, Empirical Evaluation of Adaptive Systems, Proceedings of Workshop, pages 1�8. UM2001: 8th International conference on user modeling, 2001.

[19] P. Brusilovsky, E. Schwarz, and G. Weber. ELM-ART: An intelligent tutoring system on World Wide Web. In C. Frasson, G. Gauthier, and A. Lesgold, editors, Proceedings of Third International Conference on Intelligent Tutoring Systems, ITS-96. Lecture Notes in Computer Science, volume 1086, pages 261�269. Berlin: Springer Verlag, 1996.

[20] P. Brusilovsky, E. Schwarz, and G. Weber. A tool for developing adaptive electronic textbooks on WWW. In Proceedings of WebNet'96, World Conference of the Web Society, San Francisco, CA, pages 64�69, 1996. Available online at http://www.contrib.andrew.cmu.edu/plb/WebNet96.html.

[21] L. Calvi and A.I. Cristea. Towards generic adaptive systems: Analysis of a case study. AH2002, Adaptive Hypermedia & Adaptive Web-based Systems, LNCS 2347, Springer, 79�89, 2002.

[22] A.I. Cristea. Authoring of adaptive and adaptable educational hypermedia: Where are we now and where are we going? unpublished.

[23] A.I. Cristea. MOT, My Online Teacher. Online available on: http://e-learning.dsp.pub.ro/motadapt.

[24] A.I. Cristea. Adaptive patterns in authoring of educational adaptive hypermedia. Educational Technology & Society, IEEE Learning Technology Task Force, 6(4):1�5, october 2003. formal discussion summary.

[25] A.I. Cristea. Adaptive course creation for all. In International Conference on Information Technology: Coding and Computing Volume 1, pages 718� 722. ITCC'04, IEEE, Las Vegas, Nevada, US, april 5�7 2004.

[26] A.I. Cristea. Evaluating adaptive hypermedia authoring while teaching adaptive systems. SAC'04, ACM, Nicosia, Cyprus, 2004.

[27] A.I. Cristea and L. Aroyo. Adaptive authoring of adaptive educational hypermedia. In Adaptive Hypermedia and Adaptive Web-Based Systems, pages 122�132. AH 2002, LNCS 2347, Springer, 2002.

[28] A.I. Cristea and P. de Bra. Enhancing the WWW towards adaptive and adaptable ODL environments. unpublished.

[29] A.I. Cristea and L. Calvi. The three layers of adaptation granularity. In UM03, Pittsburgh, US, 2003.

[30] A.I. Cristea and P. Cristea. Evaluation of adaptive hypermedia authoring patterns during a socrates programme class. International Peer-Reviewed On-line & Print Journal "Advanced Technology For Learning", 2004.

[31] A.I. Cristea and A. de Mooij. Adaptive course authoring: MOT, My Online Teacher. In ICT'03, Tahiti island in Papeete, French Polynesia. IEEE LTTF, "Telecommunications + Education" Workshop, feb�mar 2003. in press.

[32] A.I. Cristea and A. de Mooij. Designer adaptation in adaptive hypermedia authoring. In ITCC'03, Las Vegas, US. IEEE Computer Science, 2003.

[33] A.I. Cristea and A. de Mooij. Evaluation of MOT, an AHS authoring tool: URD checklist and a special evaluation class. In CATE'03 (International Conference on Computers and Advanced Technology in Education) Rhodos, Greece, pages 241�246. IASTED, ACTA Press, 2003.

[34] A.I. Cristea and A. de Mooij. LAOS: Layered WWW AHS authoring model and their corresponding Algebraic Operators. In WWW'03, Budapest, Hungary, 2003.

[35] A.I. Cristea et al. Minerva project: ADAPT. Online available on: http://wwwis.win.tue.nl:8080/ acristea/HTML/Minerva/index.html.

[36] A.I. Cristea and F. Garzotto. ADAPT major design dimensions for educational adaptive hypermedia. In ED-Media '04 world conference on educational multimedia, hypermedia & telecommunications. Association for the advancement of computing in education (AACE), 2004.

[37] A.I. Cristea and F. Garzotto. Designing patterns for adaptive or adaptable educational hypermedia: A taxonomy. In ED-Media '04 world conference on educational multimedia, hypermedia & telecommunications. Association for the advancement of computing in education (AACE), 2004.

[38] A.I. Cristea and Kinshuk. Considerations on LAOS, LAG and their integration in MOT. In ED-MEDIA'03, Honolulu. AACE, 2003.

[39] A.I. Cristea and T. Okamoto. My English Teacher � a WWW system for academic english teaching. In ICCE 2000, International Conference on Computer in Education, Learning Societies in the New Millenium: Creativity, Caring and Commitments. Taipei, Taiwan, 2000.

[40] A.I. Cristea, C. Stewart, and H. Ashman. Authoring and delivering adaptive hypermedia courseware. Minerva/ADAPT, 2002.

[41] A.I. Cristea and M. Verschoor. The LAG grammar for authoring the adaptive web. In ITCC'04, IEEE, Las Vegas, US, 2004.

[42] P. Crosby. Quality is Free. McGraw-Hill, 1979.

[43] V.B. Devedzic. Key issues in next-generation web-based education. IEEE transactions on systems man. and cybernetics -- part c. applications and reviews, 33 no. 3, 2003.

[44] J. Fink, A. Kobsa, and J. Schreck. Personalized hypermedia information provision through adaptive and adaptable system features. http://zeus.gmd.de/hci/projects/avanti/publications/ISandN97/ ISandN97.html, 1997.

[45] International Organisation for Standardisation. Iso 9241: Ergonomics of human system interaction -- parts 10 & 12 . http://www.userfocus.co.uk/resources/iso9241 and http://www.iso.org (to be payed for).

[46] M. van Geloven, R. Koper, and J. van der Veen. E-learning trends. http://hdl.handle.net/1820/212, 2004.

[47] E. Gilbert, R. H� ubscher, and S. Puntambekar. Assessment methods in webbased learning environments & adaptive hypermedia. International Journal of Artificial Intelligence in Education, 12:1020�1029, 2001. AIED2001 Workshop.

[48] R.B. Grady and D.L. Caswell. Software Metrics: Establishing a Companywide Program. Prentice-Hall, 1987.

[49] A. Gupta and P.S. Grover. Proposed evaluation framework for adaptive hypermedia systems. In L. Aroyo and C. Tasso, editors, AH 2004: Workshop Proceedings Part I, pages 158�168. 3rd International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, august 2004.

[50] P. Karampiperis and D. Sampson. Adaptive hypermedia authoring: From adaptive navigation to adaptive learning support. In AH 2004: 3rd International Conference on Adaptive Hypermedia and Adaptive Web-based Systems, Workshop Proceedings Part II, pages 449�454, 2004.

[51] D.A. Kolb. Experiential Learning Experience as the Source of Learning and Development. Prentice-Hall, 1984.

[52] G. Kouroupetroglou, M. Stamati, and C. MetaxakiKossionides. WorldWide-Web supporting for the learning of non-orthographic languages. In Proceedings of the 7th International Conference on Computers in Education, pages 995�1004, Chiba, Japan, 1999. ICCE99.

[53] M. Kravcik, M. Specht, and R. Oppermann. Evaluation of WINDS authoring environment. In P. de Bra and W. Nejdl, editors, Adaptive Hypermedia and Adaptive Web-Based Systems, number 3137 in LNCS, pages 166�175, Eindhoven, the Netherlands, August 2004. Third International Conference, AH 2004, Springer.

[54] J. McCall, P. Richards, and G. Walters. Factors in Software Quality. NTIS AD-A049-014, 015, 055, November 1977. three volumes.

[55] R. Min. Transistor example -- 3 dimensional. Online available on: http://projects.edte.utwente.nl/pi/Examples/TransistorARNS/index.htm.

[56] R. Min. Websites in Education; Important Types of Educational Websites. online available at: http://projects.edte.utwente.nl/pi/eBookW/Home.html, 2003.

[57] R. Min and I. de Diana. An evaluation plan for a combined product and process evaluation in authoring adaptive courseware. Twente part of the Minerva/ADAPT project, 2004.

[58] R. Min and I. de Diana. Interim report, information about the twente courseware. Adapt/Minerva project, Twente part, 2004.

[59] R. Min, I. de Diana, and N.J.C. Primus. Questionnaire to evaluate AEH authoring systems, 2004.

[60] A. Mitrovi� c, B. Martin, and M. Mayo. Using evaluation to shape ITS design: Results and experiences with SQL-Tutor. International Journal of User Modeling and User-Adapted Interaction, 12(2�3):243�279, 2002.

[61] A. Moore and T. Brailsford. Whurle, Web-based Hierarchical Reactive Learning Environment. http://whurle.sourceforge.net/.

[62] A. Moore, C.D. Stewart, D. Martin, T.J. Brailsford, and H. Ashman. Links for learning: Linking for an adaptive learning environment. In 3rd IASTED International Conference on Web-Based Education - WBE 2004. Innsbruck, Austria, February 16�18 2004.

[63] C.H. Muntean and J. McManis. End-user quality of experience layer for adaptive hypermedia systems. In L. Aroyo and C. Tasso, editors, AH 2004: Workshop Proceedings Part I, pages 87�96. 3rd International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, august 2004.

[64] T. Okamoto, A.I. Cristea, and M. Kayama. Future integrated learning environments with multimedia. Journal of Computer Assisted Learning, University of Electro-Communications, Tokyo, 17(1):4�12, 2001. invited paper.

[65] R. Oppermann, R. Rashev, and Kinshuk. Adaptability and adaptivity in learning systems. Knowledge Transfer, pAce, London, UK, II:173�179, 1997.

[66] A. Paramythis, A. Totter, and C. Stephanidis. A modular approach to the evaluation of adaptive user interfaces. In S. Weibelzahl, D.N. Chin, and G. Weber, editors, Empirical Evaluation of Adaptive Systems, Proceedings of Workshop, pages 9�24. UM2001: 8th International conference on user modeling, 2001.

[67] R.S. Pressman. Software Engineering: A Practitioner's Approach. McGraw-Hill, 1997.

[68] J. Roschelle, C. DiGiano, M. Koutlis, A. Repenning, J. Phillips, N. Jackiw, and D. Suthers. Developing educational software components. IEEE Computer, 32(9):50�58, 1999.

[69] A.S.G. Smith and A. Blandford. MLTutor: An application of machine learning algorithms for an adaptive web-based information system. Int. J. Artif. Intell. Educ., to be published.

[70] N. Stash and P. de Bra. Building adaptive presentations with AHA! 2.0. paper, 2002.

[71] C. Stewart, A.I. Cristea, and A. Moore. Authoring and delivering adaptive courseware. In L. Aroyo and C. Tasso, editors, AH 2004: Workshop Proceedings Part II, pages 408�418. 3rd International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, august 2004.

[72] E. Triantafillou, A. Pomportsis, and E. Georgiadou. AES-CS: Adaptive educational system based on cognitive styles. In P. Brusilovsky, N. Henze, and E. Mill� an, editors, Proceedings of the AH'02 Workshop on Adaptive Systems for Web-based Education, 2002.

[73] P. Verschuren and H. Doorewaard. Het Ontwerpen van een Onderzoek. Uitgeverij LEMMA BV, Utrecht, 2000.

[74] S. Weibelzahl. Evaluation of Adaptive Systems. PhD thesis, University of Trier, Germany, 2003. Dissertation.

[75] S. Weibelzahl and C.U. Lauer. Framework for the evaluation of adaptive CBR-systems. In I. Vollrath, S. Schmitt, and U. Reimer, editors, Experience Management as Reuse of Knowledge. Proceedings of the 9th German Workshop on Case Based Reasoning, GWCBR2001, pages 254�263, BadenBaden, Germany, 2001.

Appendix A The Minerva/ADAPT project

In the Information Society of the new millennium, the use of Information and Communication Technology (ICT) is becoming essential for the rapid dissemination of information in general, and knowledge in particular. In this context, Open and Distance Learning (ODL) will have a growing role in effectively training people to have an active role in society, as a precondition of fostering a real equality among them. However, with this expansion of the reach of education and creation of equal opportunities, as well as with the need of giving practical expression to the principle of lifelong learning, the need of handling cultural and linguistic differences, gender differences, the analysis of learners attitudes and profiles, is increasing. With these premises, this projects main objective is to establish a European platform of standards (guidelines, techniques and tools) for user modelling-based adaptability and adaptation, in the sense of the new paradigm of intelligent human-computer interaction, based on the new generation of ODL tools, and using methods and techniques of, among others, artificial intelligence and neural networks, towards individualisation of the learning process. In this way, the projects main contribution is in going one step further than user modelling and comprehension, by focusing on creating a common structure for the ODL systems adaptive response to the specific user needs and working towards creating a basis for the modern European Education. For reaching this goal, the IEEE Learning Task force (LTTF) Learning Technology Standards Committee (LTSC) developing standards related especially to Learner, Content and Data and Metadata will be taken into consideration. Another contribution of the project is that it will research and devise guidelines, tools and techniques for adaptability and adaptation for ODL in distributed learning environments, so in such environments where either the learners, the teachers, the learning material and learning tools, or a combination of the previous is distributed over a wide geographic area, with a special focus on the European dimension and the educational use of the Internet. Within the user adaptability and adaptation paradigm, the project will also attend to aspects of systematisation of the authoring of adaptive courseware, as well as the particular aspects regarding user adaptation of multimedia educational resources. This project should finally generate quality indicators for adaptable and adaptive user modelling and courseware authoring for this field, with the aim of being able to evaluate the mapping new ICT methods and practices for adaptation over pedagogical considerations as well as learners cognitive profiles. Thereby, the proposed project will significantly contribute not only to standardise the new generation of ODL adaptation techniques and tools, but also to provide alternate styles of teaching and learning, optimally suited for ODL or a combination of ODL and traditional classroom teaching. Therefore, the resulting guidelines and standards will be firstly of use to educators as well as educational material developers, by providing generic, as well as more specific guidelines for user-adaptation mechanisms, while ultimately being of benefit for learners and trainees. An important aspect of the project is the bringing together of the experience of partners from different European countries, that have been and are involved in the development, usage and evaluation of adaptive environments, targeted at different categories of learners, from different social environments and cultures. Their expertise will be used for studies and comparative analysis of the particular aspects of adaptable and adaptive ODL and the use of new educational technologies towards user-adaptation standards for education, in the project outcome evaluation and the promotion and dissemination of the project results at European level.
The following objectives will be pursued:

O1. Identify a set of relevant good practices of (user modelling based) adaptation techniques for education, based on current technology such as, e. g., artificial intelligence and neural networks.

O2. Identify a set of bad practices (or techniques) of (user modelling based) adaptation techniques for education to make a clear distinction from the good ones.

O3. Extract a minimal set of relevant and necessary features for adaptation techniques in education. Extract typical features for distributed (Internet) environments, multimedia environments.

O4. Extract a supplementary set of relevant (but not necessary/ essential) features for adaptation techniques in education.

O5. Extract a set of irrelevant features for adaptation techniques in education (or sets of redundant techniques).

O6. Based on O3-O5, define guidelines (minimal set of requirements) for an authoring system for adaptive techniques in education.

O7. Build a prototype adaptive authoring tool and, separately, one (or more) training system(s) based on the minimal set of relevant features, with possible addition of supplementary features.

O8. Evaluate the adaptive prototype system on different target groups.

O9. Disseminate and promote the results.

All the objectives mentioned here are studied by all the participants in the project, which are the Eindhoven University of Technology (the Netherlands), the University of Nottingham (UK), the University of Southampton (UK),the Politecnico di Milano (Italy), Centro per la Ricerca Scientifica e Tecnologica (Trento, Italy) and the University of Twente (the Netherlands). The sixth and seventh objectives (O6 and O7) are studied in particular by Eindhoven and Nottingham and the eight objective (O8) receives special attention by members of the Twente team.

Appendix B Issues in the Minerva/ADAPT project

As made clear in appendix A, the Minerva/ADAPT project is based on the cooperation of researchers in Eindhoven (the Netherlands), Nottingham (UK), Southampton (UK), Milano (Italy), Trento (Italy) and Twente (the Netherlands). The development of new tools is taken on by the project members in Eindhoven (MOT and AHA!) and Nottingham (WHURLE), whereas the Twente team mainly focuses on evaluation aspects. In order to evaluate the three test systems, the Twente team is dependent on the work of the other project members. At the start of the project (October 2002), Min and De Diana [57] developed a transistor circuits demonstration course, which contains several chunks of information, three different dimensions (text, visuals and an applet) and a test to determine the student's knowledge level. The meaning of this was to provide the developers of the test systems with a simple though adequate measuring tool. By implementing the demonstration lesson, they could easily assess their own system. Min and De Diana implemented their own transistor course using JavaScript. The resulting demonstration lesson [55] is capable of showing adaptation techniques as well as multiple information dimensions. As such, this JavaScript lesson forms a dreaded and challenging 'opponent' for the Minerva/ADAPT systems. In the course of the years, no single (transistor) test lesson was ever created by either researchers from Eindhoven or Nottingham. At some moment, one and a half years after the start of the project, I joined the group of Behavioural Sciences to perform the graduation research described in this report. The main goal of the research was to develop an evaluation framework, and part of the assignment was an evaluation of MOT, AHA! and WHURLE. All Minerva/ADAPT developers reacted positive and all of them assisted in setting up the different test systems on computers in Twente. However, it turned out that I had to create the test lessons myself. Only the developers of AHA! provided an example of their own transistor test course. Because of the time-consuming process of installing, dealing with all different kinds of web servers, interpreters, databases and such, testing stopped about two or three months after the start of the graduation project. At that stage, the transistor test course was implemented in MOT and AHA! and transferred from MOT to WHURLE. However, there were no tests included to determine the knowledge level of the students. At this time, the Minerva/ADAPT project had officially come to an end. It is interesting to do a feature-by-feature comparison between one of the ADAPT systems on the one hand and the JavaScript transistor demonstration course on the other. The former being a true AEH authoring system, the latter not. This is, to a certain extent, not fair, as the JavaScript product is specifically tailored for this purpose as a one time only demonstration course. It contains no reusable or shareable materials and uses no generic, predefined adaptation rules and such. In contrast to MOT, AHA! and WHURLE, which 'suffer' from being provided only standard objects to create the test lesson. In fact, 'tricks' have to be used to include multiple dimension in these systems, as they do not provide an editor for each dimension. At the end of the comparison, the JavaScript product is able to present all its capabilities and possibilities. But, of course, MOT, AHA! and WHURLE are the future of AEH authoring.

Appendix C Design Patterns

This appendix provides some background on design patterns. These are considered an important factor in the design of AEH authoring applications by Cristea [37], one of the leaders of the Minerva/ADAPT project (see appendix A).

C.0.1 Design patterns in general

Christopher Alexander in 1977 stated [2]: "(a design pattern). . . describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice". So design patterns are reusable design solutions to recurrent design problems. The idea behind patterns in general is to take the struggle with implementation techniques away from the creators of software programs. This can be on any level, in the educational field, for instance, it is often a barrier for teachers to create educational software while they don't have enough experience and knowledge of programming languages. By creating authoring environments such as MOT, AHA! and WHURLE teachers are offered tools for authoring their course material, without the need for teachers to understand anything of programming.

C.0.2 Patterns in authoring AEH

According to Cristea and Calvi [29], there are three different layers of adaptation techniques. The lowest layer consists of direct adaptation rules which are based on, for instance, the hiding and showing of links and nodes (information-units) in hypermedia. Often, condition-action rules are used in this layer. The medium layer is formed by the adaptation language, which groups the techniques of the lower level in, for instance, IF-THEN statements. The highest layer of adaptation techniques is called adaptation strategies, where adaptation is reached by specifying user models. Within these models, lower level techniques can be used. Every layer acts as a wrapper for its lower layers. Programmers can use the lowest layer techniques to have the most control over their work, authors who haven't programming experience are able to use only the highest level techniques, which make it easy for them to create lessons.

One of the goals in present research on AEH is to identify more general patterns of adaptive behaviour and make them reusable via an authoring environment that offers these patterns, or even groups of patterns (adaptation strategies) [24].

Glossary

adaptability: a system which provides the user with tools that make it possible to change the system characteristics.

adaptation: the process of adapting to the characteristics and needs of the user (student).

adaptation decision: information processing step where the user (student) properties are used to adapt the interface.

adaptation model: a software model containing information on the adaptation strategies, adaptation rules and adaptation language. Defines how user (student) actions are translated into user model updates and into generation of an adapted presentation of a requested page.

adaptation technique: the way the system yields the adaptation, subdivided in adaptive presentation and adaptive navigation support.

adaptivity: an interactive system that changes its behaviour depending on the individual user's (student) behaviour on the basis of nontrivial inferences from information about the user.

AEH: Adaptive Educational Hypermedia; a computer program which supports or even replaces the traditional classical teacher. It offers each user course materials personalised to his or her characteristics, (cognitive) abilities, knowledge levels, and so on. The contents contain hyperlinks to other course materials, which can be anything from text to visuals, sound fragments, videos or applets and is ordered in no definite way. On the contrary, it is due to changes even during the lesson program, according to the state of the user model.

AEH authoring system: an application to create Adaptive Educational Hypermedia.

AH: Adaptive Hypermedia. Hypermedia that automatically personalise themselves to each user.

AHA!: Adaptive Hypermedia Architecture or Adaptive Hypermedia for All. An AEH authoring application, used in this report as test system.

attribute: an element of a concept which contains information like keywords, indicators or the actual information.

authoring: the development of course materials and lesson plans.

chunk: a small construct (discrete unit), typically containing a single media item (e. g., paragraph of text).

concept: virtual objects (software) that contain course materials.

contents: here: course materials. Also referred to as content.

content management: processes to support a user in gathering, storing and displaying information.

course developer: a person who creates course materials and lesson plans. Here: user.

course management: student support processes by which no intelligence is adopted, e. g., discussion groups, providing materials and schedules.

domain model: a software model containing a conceptual description of the application's contents. Contains concepts and concept relationships.

dynamic: here: A system which incorporates adaptation through adaptive processes.

educational system: software application written specifically for learning purposes.

eLearning: electronic highway learning. The combination of all different forms of learning in which computer systems are involved.

end-user: the teacher or course developer.

environment: a software application.

hypermedia: an acronym which combines the words hypertext and multimedia. The words hypermedia and hypertext are often used as synonyms. Although hypertext suggests that all information is in the form of plain text, most hypertext systems allow the use of information in other forms, such as graphics, sound, animation and/or video.

inferences drawn: the conclusion a system makes on the basis of the input acquisition.

input acquisition: the process of gathering information about the student. Examples: measuring his activity through counting key strokes or his knowledge level by performing a test.

ITS: Intelligent Tutoring System, see AEH

learning materials: see course materials

learning style: the preferred way in which an individual 'learns'.

Minerva/ADAPT: project funded by the EU, which goal is to "establish a European platform of standards (guidelines, techniques and tools) for user modeling -based adaptability and adaptation".

MOT: My Online Teacher. An AEH authoring application, used in this report as test system.

multimedia: the use of several different media (six dimension: e. g., text, visuals, audio) to convey information.

personalisation: the process of individually tailoring a product or a service to a user. Generic term for adaptivity and adaptability.

static: here: A system which incorporates adaptation, but only is adaptable, and not adaptive.

system: a software application.

teacher: see course developer

user model: a software model consisting of a set of concepts with attributes. Contains an overlay model, which means that for every concept in the domain model there is a concept in the user model.

web: here: the internet, or WWW. Computer network consisting of a collection of internet sites (web pages) that offer multimedia resources through the hypertext transfer protocol.

web-based: a system, developed in such a way, that it is possible to connect (the system) to the internet.

web page: a document connected to the World Wide Web and viewable by anyone connected to the internet who has a web browser.

WHURLE: Web-based Hierarchical Universal Reactive Learning Environment. An AEH authoring application, used in this report as test system.

Enschede, March, 10, 2005.

Enschede; op internet gezet Rik Min: 24 Maart 2005. Papieren versie verkrijgbaar bij Niels Primus. Pdf-file verkrijgbaar bij Rik Min of Niels Primus.