211
Views
0
CrossRef citations to date
0
Altmetric
Research Article

The potential of following-up international large-scale assessment studies: using PISA 2018 to develop a comprehensive model of effective teaching

ORCID Icon, ORCID Icon & ORCID Icon
Received 03 Sep 2022, Accepted 12 Apr 2024, Published online: 24 Apr 2024

ABSTRACT

The paper investigates the potential of using international large-scale assessment studies for conducting follow-up studies testing models of educational effectiveness. The impact of teacher factors coming from the dynamic model of educational effectiveness and the dialogic education theory on student literacy achievement is examined. Modern Greek teachers from secondary schools that participated in PISA 2018 (Accessed on https://www.oecd.org/pisa/data/pisa2018technicalreport/) in Cyprus were recruited. Two Grade 11 classes per teacher were selected, giving 392 students in total. Observation data from Greek lessons and student literacy achievement data were collected. Two multilevel regression analyses were conducted: one with the whole student sample and one with only students from whom PISA prior achievement measure was identified. When prior achievement (i.e., PISA measure) was considered, additional and larger effects of factors on student final achievement were detected. Both analyses reveal that when factors from both frameworks were considered, larger variance in student achievement was explained.

1. Introduction

The fields of educational effectiveness research (EER) and of international large-scale assessment (ILSA) studies serve similar agendas since both aim to examine factors that contribute to student achievement (Kyriakides, Charalambous & Charalambous, Citation2022). The paper reports on a study which explores possibilities of establishing strong links between the two fields. The study is concerned with teacher effectiveness factors and two theoretical frameworks are used: the Dynamic Model of Educational Effectiveness (DMEE) and the educational dialogue theory. The research presented here has two aims: (1) to evaluate whether EER factors coming from these two theoretical models of teaching effectiveness can explain variation in student achievement and (2) to determine the value of ILSA studies and the potential of using them for developing and testing theoretical models of EER. The paper starts by describing the value and drawbacks of using ILSA studies for developing theoretical models of EER. Then, the theoretical framework of this study is described and the importance of establishing a comprehensive model of effective teaching is discussed. Then, we refer to the aims and questions of the study reported in this paper. Finally, the methods and findings of the study are presented and implications for educational effectiveness research are drawn.

2. The value and drawbacks of using ILSA studies to develop theoretical models of EER

The value of datasets emerging from ILSA studies is certainly undeniable. This is evident by the increasing number of countries choosing to take part in such studies in recent decades (Rutkowski et al., Citation2010). Key ILSA studies include Progress in Reading Literacy Study (PIRLS) and the Trends in International Mathematics and Science Study (TIMMS), introduced by the International Association for the Evaluation Achievement (IEA). Since 2000, the Programme for International Student Assessment (PISA), introduced by the Organisation for Economic Co-operation and Development (OECD), takes place every three years in participating countries and it aims to assess reading, mathematical and scientific literacy (particular focus on one of them each time) of 15-year-old students. ILSA studies include questions that aim to elicit information on students’ achievement, and background information from students, teachers, and their schools (Rutkowski et al., Citation2010). In addition, they collect data on students’ attitudes towards school and learning, leisure habits and school climate. PIRLS and PISA contain a questionnaire for parents or guardians, while TIMMS and PIRLS contain a questionnaire for teachers regarding their preparation and school climate, and principals regarding school resources and other schools-level issues (Rutkowski et al., Citation2010).

The databases, therefore, provide a wealth of information. In fact, they have the power to deepen our understanding of different education systems, to enable a comparison between them, and to generate fruitful discussions around educational policies (Azigwe, Citation2016; Caro, Kyriakides & Televantou, Citation2018). Much of the literature consists of studies that conduct secondary analysis with the aim to identify associations between student achievement and certain other factors (Song, Perry & McConney, Citation2014). As ILSA studies are observational studies, however, one can only talk about correlations and not causal relationships (Rutkowski et al., Citation2010). Furthermore, the cross-sectional design has serious limitations when it comes to creating new knowledge for educational theory (Caro et al., Citation2018). This paper aims to examine the strengths of utilising datasets from ILSA studies as a baseline measurement in longitudinal designs. We examine this in the context of teaching effectiveness, addressing main theories that dominate this field. In order to do this, we use the 2018 PISA database in Cyprus and attempt to search for the impact of teacher factors coming from two different theoretical models of effective teaching and explore possibilities of establishing an integrated approach to effective teaching.

The strong links between the field of EER and ILSA are evident in various ways (Kyriakides et al., Citation2022). EER is concerned with the identification of factors that operate at different levels (i.e., student, class/teacher school and system) and explain variation of student achievement. ILSA measures different types of student learning outcomes by considering student background factors across countries with the aim to compare the functioning of different educational systems and help policy makers in different countries raise awareness of the importance of education and the effects of specific factors on promoting student learning outcomes. In this context, more recent ILSA studies place emphasis on process variables drawn from effectiveness factors included in EER models. In regard to teacher factors, ILSA has recently started investigating the impact of not only generic but also content specific teaching behaviours, which aligns with the conclusions of recent meta-analyses of EER highlighting the importance of considering both generic and context-specific teaching practices (Kyriakides et al., Citation2020b; Scheerens, Citation2013).

Researchers in the field of EER conducted secondary analyses of various ILSA studies in order to identify factors associated with student learning outcomes (Khine et al., Citation2020). This is due to the larger variance in the functioning of possible predictors/factors that can be observed and due to the large samples of ILSA that provides sufficient statistical power to detect associations of factors with student learning outcomes. Researchers may also conduct not only within-country but also across-country secondary analyses in order to test the generalisability of the findings. Thus, these secondary analyses may contribute to the development of the filed both theoretically and methodologically.

In a systematic review study, Kyriakides (Citation2016) aimed to identify the contribution of PISA on educational research, as well as to draw implications for the design of PISA. Searching four scientific databases (ERIC, ERA, Scopus, ProQuest), Kyriakides (Citation2016) found 808 peer-reviewed papers published from 2000 to 2015 that used PISA. The majority of these studies (57%) used PISA for secondary analysis, while only 2.4% of them presented follow-up studies. The purpose of these secondary analyses is typically to identify associations between student achievement and certain other variables. For example, using PISA 2009 data, Lafontaine et al. (Citation2015) found significant correlations between students’ opportunities to learn in reading depending on the school and students’ reading achievement.

Utilising such databases for secondary analysis, however, has several methodological limitations that need to be addressed. The first relates to the cross-sectional design of ILSA studies, which does not include a prior achievement measure (Caro et al., Citation2018; Cordero & Gil-Izquierdo, Citation2018). This design does not allow researchers to consider whether the associations found between student achievement and other variables are confounded by prior achievement measures. This can increase the risk of inverse causality, that is getting unexpected findings, or the risk of finding no effects when in fact they do exist. Gustafsson (Citation2013), for example, reported weak or negative correlations between student achievement and time spent on homework, while Caro (Citation2011) reported negative correlations between student achievement and parental involvement in schools. Caro et al. (Citation2018) argue that these negative correlations could have arisen due to remedial practices undertaken by teachers and/or parents, as a response to low student performance. Similarly, it would be difficult to argue whether a teacher’s particular strategy is producing high-achieving students, or whether the teacher is using it because s/he has high-achieving students (O’Dwyer, Wang & Shields, Citation2015). To address the omitted prior achievement bias issue, Caro et al. (Citation2018) included a prior achievement measure for students in England who had participated in the Progress in International Reading Literacy Study (PIRLS). In their study, a teacher questionnaire was used to elicit data on how often teachers use certain activities.

The second limitation relates to studies that have attempted to identify associations between self-reported instructional activities and student achievement (Mullis et al., Citation2012). Most of these studies have taken advantage of the OECD’s Teaching and Learning International Survey (TALIS), which collects information on the practices of lower secondary teachers and principals, and has been combined with the PISA database (e.g., Kaplan & McCarty, Citation2013). Cordero and Gil-Izquierdo (Citation2018) used the TALIS-PISA link database and found that traditional, teacher-centered teaching practices led to higher student proficiency in mathematics, while more innovative, student-centered and active learning strategies had a negative impact on student mathematics achievement. The authors argue that the latter type of teaching practices aims primarily to increase students’ critical thinking, teamwork skills and other non-cognitive skills, which are not measured in ILSA studies. Le Donné, Fraser and Bousquet (Citation2016) also used the TALIS-PISA database to examine the effect of teaching practices on students’ achievement in 8 countries. They found that “cognitive activation strategies and, to a lesser extent, active learning strategies, have a strong association with student achievement in mathematics” (Le Donné, Fraser & Bousquet, Citation2016, p. 4). This association was found to be weaker in schools with socio-economically disadvantaged students.

There are two methodological issues with using the TALIS-PISA dataset to identify factors of effective instruction. The first concerns the fact that both the TALIS and the PISA database operate at the school level (Cordero & Gil-Izquierdo, Citation2018), so examining the exact effect of teaching conducted at the class/teacher level is not possible. These studies, therefore, acknowledge that they are working with aggregated data of all the teachers in the same school. The second relates to the self-reported measure of teaching practices. Teaching consists of highly complex events, which cannot be adequately captured by questionnaire data. Direct classroom observations conducted after students’ participation in ILSA studies, would be a more precise measure. Following teachers over time may not be possible, so a few studies have attempted to follow students (e.g., Baumert & Demmrich, Citation2001). However, this research design has not been systematically used and this can be attributed to practicality issues such as access to the data in schools where ILSA studies took place during the previous year and difficulties in using procedures to match at the student level the data emerged from ILSA with the data that researchers collect using their own instruments during the follow up study. Nevertheless, this paper investigates the potential of following students participating in ILSA studies over time to identify the impact of teaching factors on students’ final achievement. Specifically, we use the ILSA study data to generate a valid and reliable measure of students prior achievement and collected data on the functioning of teacher factors using valid and reliable instruments developed to measure these factors. To our knowledge, this approach has not been done to date, possibly because of the challenges of matching participants to their data from the ILSA study. This challenge may lead to a reduced sample. To avoid this problem, the cost of the follow-up study may be increased since the initial sample should be large enough in order to ensure that the matching sample will provide sufficient statistical power.

3. Framework of the study

In this study we aim to examine the extent to which factors coming from two different theoretical frameworks of effective teaching can explain student achievement variance in the context of higher secondary education in Cyprus, as well as the potential of utilising PISA, an ILSA study, in this process. More specifically, two theoretical frameworks of teaching effectiveness are considered: (a) the DMEE and (b) the educational dialogue theory. It is also examined whether an integrated approach to effective teaching can be proposed. These two theoretical models of effective teaching are presented and the reasons for combining these two approaches are discussed.

3.1. The Dynamic Model of Educational Effectiveness

The DMEE (Creemers & Kyriakides, Citation2008) considers factors affecting student achievement that are located at four different levels (student, classroom, school, system) and was developed to establish stronger links between EER and teacher and school improvement. In this paper, the impact of classroom level factors on student achievement is examined. At classroom level, the DMEE refers to eight factors that concern teachers’ instructional behaviour which were found to be associated with student learning outcomes. An integrated approach to effective teaching is adopted since the DMEE refers to factors associated with both the direct and active teaching approach (e.g., structuring, application) and with constructivism (e.g., orientation, modelling). Multiple theories of learning are considered in defining the eight factors. For example, motivation learning theories and the cognitive load theory are considered in defining orientation and application, correspondingly. The eight teacher factors are briefly presented below (for more details see Kyriakides et al., Citation2020a):

  1. Orientation: This factor concerns the ways teachers encourage students to identify the reasons for which a series of lessons or a lesson or specific teaching tasks take place. The engagement of students with orientation tasks might encourage them to actively participate in the classroom since the tasks that take place become meaningful for them (e.g., De Corte, Citation2000; Paris & Paris, Citation2001). This factor is, therefore, in line with motivation theories of learning.

  2. Structuring: Student achievement is maximised when teachers not only actively present materials but structure it by: (a) beginning with overviews and/or review of objectives, (b) outlining the content to be covered and signalling transitions between lesson parts, (c) calling attention to main ideas, and (d) reviewing main ideas at the end. Thus, this factor aims to identify whether teachers offer structuring tasks which not only facilitate memorising of the information but allow for its apprehension as an integrated whole with recognition of the relationships between parts (Case, Citation1993; Scheerens, Citation2013).

  3. Questioning: By considering that in each lesson several questions are raised, this factor aims to identify the questioning skills of teachers. The difficulty level and the clarity of the questions are examined. Teachers raise both product questions (i.e., expecting a single response from students) and process questions (i.e., expecting students to provide explanations) but effective teachers are those who ask more process questions. Finally, this factor is concerned with the way the teacher deals with student responses (correct, partly correct, and incorrect answers) and aims to find out whether constructive feedback to students is provided.

  4. Modelling: This factor comes from research on teaching higher order thinking skills and especially problem solving and is in line with constructivism. It is concerned with the extent to which teachers help students use strategies and/or develop their own strategies which can help them solve different types of problems (Kyriakides, Anthimou & Panayiotou, Citation2020a). Students are also encouraged to develop skills that help them organise their own learning (Kraiger, Ford & Salas, Citation1993).

  5. Application: By considering the cognitive load theory, this factor investigates the opportunities provided to students for practicing and applying their knowledge. This factor is associated with the direct teaching model (Rosenshine, Citation1983), which emphasises immediate exercise of topics taught during the lesson and direct feedback provided by the teacher either at an individual or group level. It is also examined whether students are asked simply to repeat what they already covered with their teacher or whether the application task is more complex than the content covered in the lesson, perhaps even being used as a starting point for the next step in teaching and learning.

  6. Time management: Teachers are expected to organise and manage the classroom environment as an efficient learning environment and thereby to maximise engagement rates (e.g., Creemers & Reezigt, Citation1996; Scheerens, Citation2013; Wilks, Citation1996). In this context, the DMEE treats management of time as one of the most important indicators of a teacher’s ability to manage classroom in an effective way.

  7. Classroom as a learning environment: The DMEE refers to the teacher’s contribution in creating a learning environment in his/her classroom, and five elements of the classroom as a learning environment are taken into account: (a) teacher-student interaction, (b) student-student interaction, (c) students’ treatment by the teacher, (d) competition between students, and (e) classroom disorder. The first two elements are important components of measuring classroom climate, as classroom environment research has shown (Cazden, Citation1986; den Brok, Brekelmans & Wubels, Citation2004). However, the DMEE refers to the type of interactions that exist in a classroom rather than to how students perceive teacher interpersonal behaviour. The other three elements refer to the attempt of teachers to create a businesslike and supportive environment for learning.

  8. Assessment: The DMEE gives emphasis on conducting assessment for formative reasons. Assessment is also seen as an integral part of teaching (Delandshere, Citation2002; Stenmark, Citation1992). Thus, information gathered from student assessment should enable teachers to identify their students’ needs as well as to evaluate their own teaching practice (Christoforidou & Kyriakides, Citation2021; Krasne et al., Citation2006).

These factors are measured across five dimensions (i.e., frequency, focus, stage, quality, and differentiation) which enable us to collect both quantitative and qualitative characteristics of the functioning of each factor. For example, the stage dimension is concerned with the time-period that tasks associated with each factor are observed whereas differentiation gives emphasis to the extent to which the different needs of students are considered in providing tasks associated with each factor. The dimensions are not only important from a measurement perspective, but also, and even more, from a theoretical point of view. For example, the focus dimension is related with the synergy theory (see Liu & Jiang, Citation2018) whereas the stage dimension implies that the factors need to take place over a long period of time to ensure that they have a continuous direct or indirect effect on student learning. The differentiation dimension is based on findings of differential effectiveness research which reveals that adaptation to the specific needs of each group of students may increase the successful implementation of a factor and ultimately maximise its effect on student learning outcomes. Therefore, the DMEE attempts to describe the complex nature of effective teaching both by pointing out the importance of specific factors and by explaining how the functioning of each factor can promote student learning.

The validity of DMEE at classroom level has been tested by more than 20 empirical studies conducted mainly in Europe but also in some non-European countries such as Canada, Ghana and Maldives (for a review of these studies see Kyriakides et al., Citation2020b). These studies revealed that the teaching factors and their dimensions are associated with student achievement. A synthesis of teacher effectiveness studies has also revealed empirical support to the DMEE. The empirical studies testing the DMEE collected data on students’ prior and final achievement and curriculum-based tests were developed and administered in order to measure prior and final achievement. In this paper, prior achievement data emerged from student responses to the PISA study and in this way not only the study is less expensive but also issues associated with testing and re-testing are avoided. In addition, the predictive validity of PISA is examined. It is also important to note that only three of these empirical studies examined the impact of teacher factors of DMEE on achievement of lower secondary school students. The main reason for conducting studies mainly in primary schools is that EER shows that the younger the students, the larger the teacher effect (Kyriakides et al., Citation2020b; Scheerens, Citation2013). The study presented here is concerned with achievement of students in upper secondary schools. The closer students are towards the end of secondary education, teachers are more likely to attempt to prepare students for university entrance exams rather than support them to develop high order thinking skills. Since the eight factors of DMEE are considered as generic, the impact of these factors on achievement of upper secondary school students should be examined and the extent to which PISA can help us achieve this aim is investigated through this study. We finally aim to explore the extent to which DMEE can be expanded by considering teaching factors associated with the educational dialogue theory.

3.2. Educational dialogue theory

The value of specific forms of classroom dialogue that can support student learning is increasingly being acknowledged in research (Mercer, Wegerif & Major, Citation2020). These forms of dialogue go beyond the traditional “Initiation-Response-Feedback” (IRF) format, which consists of a teacher’s initiation typically involving a closed question, a brief student response, and teacher’s evaluative feedback (Sinclair & Coulthard, Citation1975). In dialogic classrooms, students are encouraged to participate in whole-class and groupwork discussions. They are not only expected to contribute their ideas, but to co-construct and refine ideas in collaboration with their peers (Littleton & Mercer, Citation2013). Productive forms of dialogue have been referred to in the literature using various terms, depending on the specific approach adopted. These include “dialogic teaching” (Alexander, Citation2018), “accountable talk” (Michaels & O’Connor, Citation2015), and “dialogic inquiry” (Wells, Citation1999). In this paper, we use the term “educational dialogue” to encompass productive dialogue forms.

A number of dialogue forms have been identified as productive in the literature. One form concerns teacher questioning. Instead of asking questions that only require students to recall information, teachers should use open questions that prompt students’ thinking (e.g., Nystrand et al., Citation2003). Examples would be questions that ask for students’ justifications, predictions, hypotheses, or elaborations; these can be questions that initiate discussion or questions that follow-up on students’ contributions (Vrikki & Evagorou, Citation2023). Another productive indicator concerns the nature of students’ contributions. Instead of providing brief and pre-determined answers, students should have opportunities for extended contributions that include elaborations and justifications of their answers (Hennessy et al., Citation2016; Muhonen et al., Citation2018). A third indicator of productive dialogues includes attempts to bring multiple ideas together. This can be done by identifying links or highlighting differences between them so that students can collectively pursue lines of inquiry (Howe et al., Citation2019; Vrikki et al., Citation2019). Such actions can be done by the teacher or even by the students, when they are perhaps working in groups and they have had experience of using such dialogue forms. To achieve educational dialogue is not an easy task. It requires an appropriate climate and culture in the classroom, one that is based on respect between all dialogue participants and feelings of safety to express ideas. Developing this collective environment, therefore, requires time and teachers’ persistence.

4. Research aims

The aim of this paper is to examine the extent to which factors of teaching effectiveness emerging from the two theoretical models, namely the DMEE and the educational dialogue theory, can explain variation in student achievement. In addition, we examine the potential of using ILSA studies to further understand relationships between teacher factors and student achievement. To examine these aims, we formulated the following research questions:

  1. To what extent do each of the eight factors of the DMEE explain variation in student achievement?

  2. To what extent do the factors of the educational dialogue theory explain variation in student achievement?

  3. To what extent by considering both factors of the DMEE and educational dialogue theory more variance in student achievement can be explained?

  4. To what extent the above research questions can be addressed through a follow-up study of PISA rather than through cross-sectional studies?

5. Methods

5.1. Participants

The target population of this study was students in Cyprus who had participated in the PISA Citation2018 study. These were Grade 11 students in secondary education;Footnote1 they participated in PISA Citation2018 at the end of Grade 10.

For the sample recruitment, we invited all secondary schools in Nicosia and Larnaca (n = 14) which participated in PISA Citation2018. Nine schools agreed to take part in the study and all Modern Greek teachers (the subject includes language and literature) of these nine schools participated in the study. Specifically, these were 21 teachers (4 male, 17 female). For each participating teacher, we recruited his/her Grade 11 students. If teachers taught in more than one classroom, we randomly selected two classes per teacher. The total sample of students was 392 (57.4% female and 42.6% male) and their average age was 17.03. Although the student sample is not a nationally representative sample, the chi-square test did not reveal any statistically significant difference at .05 level between the sample and the student population of Cyprus in terms of gender (X2 = 1.01, d.f. = 1, p = 0.31). Moreover, the t test did not reveal any statistically significant difference between the research sample and the student population in terms of age (t = 0.82, d.f. = 8396, p = 0.41). Therefore, the student sample had the same characteristics as the national sample in terms of gender and age.

5.2 Data collection

The following data were collected from these classrooms:

5.2.1. Observational data of quality of teaching

To measure the quality of teaching, data related to the DMEE and the educational dialogue theories were collected via lesson observations. Two observation schemes were used for the collection of this data.

First, a high-inference tool was used for the collection of data on the DMEE. The high-inference observation instrument covers the five dimensions (i.e., frequency, focus, stage, quality, and differentiation) of all factors of the model but assessment, and the observer is expected to complete a Likert scale in order to indicate how often each teacher behaviour was observed. This, along with other observation tools, was tested for its construct validity in more than 15 teaching effectiveness studies conducted in different countries (including Cyprus) by using structural equation modelling techniques (see Kyriakides et al., Citation2020b for a review of the validity of these instruments). Second, the “Dialogue Observation Scheme” (thereafter, the DOT) was used for the collection of data on educational dialogue. The tool (also described in Vrikki & Kyriakides, Citation2018; Vrikki, Kyriakides & Anastasou, Citationaccepted) focuses on interactions that involve the teacher in three contexts: teacher-whole class, teacher-student groups, teacher-individual students. It uses a time-sampling technique, which allows the observer to alter between minutes of observation and minutes of resting. During minutes of observation, the observer makes a note only when s/he observes five dialogue moves in four different forms. Dialogue moves are elaboration, reasoning, coordination, referencing back and challenging. These can be observed as statements, as opening invitations, as invitations for students to act on own ideas, or as invitations for students to act on others’ ideas. The tool also considers four reactions on the part of the teacher when students do not respond as expected to a teacher invitation: drop the question, repeat exactly the same question, address another student/whole class, or reformulate the question with simpler words. The tool was validated in previous work by using the Rasch model (see Vrikki & Kyriakides, Citation2018).

Two observers were present in each lesson observation. Observer A was responsible for the high-inference observation tool of DMEE and Observer B was responsible for the DOT. While the two observers were sitting in the same lessons for observation, they did not have access to each-others’ observation notes, so that they would not influence each other about how they viewed the lesson. Observer A had gone through training and used the instruments in previous studies. This yielded high inter-rater reliability scores (Creemers & Kyriakides, Citation2012; Kyriakides et al., Citation2020b). The inter-rater reliability of Observer B was examined by asking her and another researcher (not Observer A) to rate 14 video-recorded lessons by using the DOT. The two observers conducted simulated “live” observations, meaning that they observed the videos without pausing or rewinding. The coding of the observation data of the two researchers were aggregated at the level of the lesson and correlations were used to examine the agreement between the two researchers. Positive and satisfactory correlations were found for all variables.

5.2.2. Students’ achievement in literacy data

At the end of the school year (June 2019), a literacy test that was developed as part of the present project was administered to the 392 Grade 11 students participating in the study. The literacy test was developed based on the curriculum of Grade 11 and in consultation with a team of experts of Modern Greek in secondary education. The team guided our research team in terms of the structure of the test, skills to assess, and appropriate topics to include. In addition, the test was based on skills that are examined by PISA. The test, therefore, was not only based on the curriculum of Cyprus, but also on the expectations set for students in Cyprus in terms of skills considered by the PISA studies, such as extrapolating what they have learnt and applying their knowledge in new situations (see PISA Technical Report, Citation2018).

The test was based on one of the main themes covered by the curriculum, namely “Social isolation – Racism (Discrimination – Prejudices)”. It started with a text comprehension part, which asked students to read an unknown Greek text. The text talked about the sources and consequences of racism. Students were then asked to respond to four multiple choice questions and two “short-answer” questions which aimed to check their comprehension. Then, a vocabulary part followed, which asked students to find synonyms, etymology, and derivatives of words included in the text.

As the aim was for this test to be completed in a single teaching period (45 min), it was decided not to include a writing section, as this would be time-consuming. The validity of the test was tested using the Extended Logistic Model of Rasch (Andrich, Citation1988), which revealed a very good fit to the model. Specifically, for each scale, separation indices were higher than 0.85 indicating that the separability of each scale was satisfactory (Wright, Citation1985). Moreover, the infit mean squares and the outfit mean squares of each scale were near one and the values of the infit t-scores and the outfit t-scores were approximately zero. Furthermore, each analysis revealed that all items had item infit with the range 0.83–1.15. Apart from the validation study, also in the main study the Item Response Theory (IRT) (Hambleton & Swaminathan, Citation1985) was used to analyse the data that emerged from students’ responses to the test. Estimation was made using the Extended Logistic Model of Rasch, which revealed that the test had satisfactory psychometric properties. Therefore, achievement in literacy was estimated by calculating the Rasch person estimates.

5.2.3. Student background data

The literacy test was accompanied by a student background questionnaire, which was based on questions asked in the PISA study as well. The theoretical models of EER give special attention to the effect of student level factors on achievement. For this reason, student background factors, such as SES, gender, and ethnicity are taken into account and then the impact of teacher factors is examined (Creemers, Kyriakides & Sammons, Citation2010). Since we had to match individual responses to PISA and to the follow-up study, student background data enabled us to develop a code for matching purposes (see section 5.3.1). Thus, the questionnaire elicited information on students’ date of birth and gender, parents’ educational level (i.e., highest educational qualification that their mother and father hold) and student facilities using the same questions as those included in the PISA student questionnaire. In regard to student facilities, a 3-factor model (explaining 48% of the total variance) was derived from exploratory factor analysis of students’ responses to the items concerning facilities available at home. Specifically, the following three factors were identified: (a) at-home resources (e.g., room, computer, quiet place to study), (b) cultural capital (e.g., poetry, literature, art), and (c) luxury items (i.e., jacuzzi, home cinema, alarm system). It is important to note that the third factor consists of country-specific items, while the first two factors include items common with other countries in the PISA study. High inter-item reliability was identified with all item-total correlations within each of the 3 factors being highly significant (p < .001). Acceptable levels of internal consistency were indicated by Cronbach’s Alpha coefficients ranging from .72 to .81 for all the factors. Thus, these factors were taken as the operationalisation of student facilities.

5.3 Data analysis

5.3.1. Creating sub-samples for analysis

To answer to the research question of this study, it was decided that a comparison between two sub-samples was required: one sample with a prior achievement measure (based on responses to PISA, Citation2018), and one sample without such a measure.

To create the sub-sample with the prior achievement measure, the open access data on PISA Citation2018 in Cyprus were accessed. As all data is anonymous on this database, our research team asked the Cyprus Educational Research Centre of the Ministry of Education (KEEA), which is responsible for the database, to prepare a shorter version of the database including only the data for the nine schools participating in our study; this database was still kept anonymous. In order to identify students who had participated in PISA Citation2018 from our sample of 392 students, we made use of the data from the students’ background questionnaire to create an individual code for each student. The code was based on students’ birth month, birth year, gender, mother’s and father’s educational level and the 16 student facilities (e.g., possession of desk in room, quiet place to study, computer, internet, literature). Using this code, 139 students were identified. It was not possible to match the rest of our student sample either due to the fact that not all of them participated in the PISA Citation2018 study or because their questionnaire responses during the two data collections were not the same. In other words, from the 392 participating students in our study, we had a pre- and a post-measure for 139 students and a post-measure only for the remaining 253 students.

Using the dataset with the 139 students, a statistically significant correlation was found between the PISA score and the Rasch score emerged from student responses to the literacy test (r = .264, n = 139, p = .012). This highlights the predictive validity of PISA. Since the literacy test was curriculum-based, this result also suggests that PISA is not irrelevant to the curriculum in Cyprus. As more evidence is required supporting this finding, future research studies should focus on investigating the predictive validity of ILSA studies in other countries as well. This work can also focus on the time difference element by examining the effects of longer time periods between the two administrations (i.e., longer than one year).

The 139 students came from 26 classes. In 12 of the classes only a few students were identified. In order to ensure that this would not affect the outcomes of the analyses, Model 0 and Model 1 of the multilevel analysis was conducted once with all classes included and one with the 12 classes excluded. No differences were found between the results of the two analyses, so it was decided that the analysis could continue with all classes included.

Furthermore, in order to test for selection bias, the matched group of students and the unmatched group of students were compared in terms of the student background variables. The chi-square test did not reveal any statistically significant difference in terms of gender (X2 = 0.08 df = 1 p = 0.78). Furthermore, the t-test did not reveal any statistically significant difference in terms of the three student facilities factors (At-home resources: t = −0.22 df = 386 p = 0.83; cultural capital: t = 0.70 df = 385 p = 0.48; luxury items: t = 0.02 df = 386 p = 0.99) or their performance on the literacy test administered by our team at the end of Grade 11 (t = −.0.32 df = 379 p = 0.75).

5.3.2. Variables

Four types of variables were included in the data analysis:

  1. Student achievement defined by students’ scores on the literacy test administered at the end of Grade 11

  2. Student prior achievement defined by scores on the PISA (Citation2018) test achieved by students who could be matched

  3. Quality of teaching factors defined by lesson observation data

  4. Background factors including gender, parental educational level, and student facilities.

5.3.3. Data analysis

Multilevel regression analysis was conducted in order to examine the effect of teaching quality on student achievement in literacy at the end of Grade 11. Using the MLwiN software, two different analyses were examined:

  • Analysis 1 included the 392 students who participated in our 2019 study.

  • Analysis 2 included only the 139 students who were matched between the PISA Citation2018 database and our 2019 database.

The two analyses were comparable when considering the variables of the analysis, as they only differed in terms of prior achievement measure. For both analyses, the data were conceptualised as a two-level model, consisting of student at the first level, and classroom at the second level. The first step in each analysis was to determine the variance at the student and classroom level without explanatory variables (i.e., empty model). Then, for both analyses, the following explanatory student background variables were added to the empty model: gender, parents’ education level, and the three indicators of socio-economic status (i.e., at-home resources, cultural capital, and luxury items). For analysis 2, prior achievement (i.e., performance on PISA, Citation2018) was also added to the empty model. In models 2a–2j, all the teaching quality factors of the DMEE and dialogue variables were added separately into model 1. Specifically, in model 2a, the variable measuring the structuring skills of teachers was added to model 1 and its effect on student final achievement was examined. In model 2b the variable structuring was removed, and another teacher factor of the DMEE (i.e., Modelling) was added. In this way, the association of each teaching quality factor with student final achievement was examined separately. Finally, three versions of model 3 were developed. In model 3a, all factors of the DMEE were added whereas model 3b was concerned with all the factors of educational dialogue but none of the DMEE. In model 3c all factors of both frameworks were added in order to examine whether a greater percentage of student achievement variance would be explained by considering factors of both frameworks.

6. Results

In the following sections, the main findings of the study concerning the impact of teaching quality factors of the DMEE and dialogue variables with student final achievement in literacy on student final achievement are presented.

6.1. The impact of teaching factors on students’ final achievement in literacy

This part presents the results of both multilevel regression analyses. presents the results of Analysis 1, which included the 392 students without a prior achievement measure, and Analysis 2, which included the 139 students who were matched between the PISA (Citation2018) database and our 2019 database. While all teaching quality and dialogue factors were checked one-by-one as explained in section 5.3.3, presents the results of only two teaching quality factors associated with the DMEE and one dialogue factor for reasons of space.

Table 1. Parameter estimates and (standard errors) for both analyses of student achievement in literacy: Analysis 1 (A1) included the 392 students and Analysis 2 (A2) included the 139 students who were matched between the PISA (Citation2018) database and our 2019 database.

The following observations arise from . First, a comparison of the empty models of the two analyses reveals a significant classroom effect especially since the variance of student achievement situated at the classroom level was found to be higher than 25% of the total variance. Second, in model 1 of both analyses, background variables (i.e., gender, parents’ education level, at-home resources, cultural capital, and luxury items) were added to the empty model. In each analysis, gender and cultural capital was found to be associated with students’ final achievement in literacy. In Analysis 1 (see ), parents’ education level was found to be a statistically significant background factor. This effect was eliminated in Analysis 2, which involved a smaller sample. This implies that when statistical power is reduced, the effect of some background variables might not be detected. For Analysis 2, prior achievement (i.e., performance on PISA Citation2018) was also added to the empty model and was found to have a statistically significant effect at level 0.05. It is also important to note that model 1 of Analysis 2, where prior achievement was included, explained 14.23% of the total variance, whereas model 1 of analysis 1 explained only 9.67%. In the various versions of model 2, all the teaching quality factors of the DMEE and dialogue variables were added separately into model 1. presents the results of three of these alternative models (i.e., models 2a–2c). It is important to note that regarding the quality of teaching, the factors concerned with structuring, modelling, time management and teacher-student interactions were found to have a statistically significant association with student final achievement in both analyses. In analysis 2, orientation and classroom environment also had a statistically significant association. In addition, regarding the dialogue variables two factors, namely teacher inviting students to reason and teacher inviting reasoning, had a statistically significant association with student final achievement in both analyses. Apart from these factors, in analysis 2 teacher challenging and teacher inviting students to reason on others’ ideas, had also a statistically significant association. Therefore, analysis 2, which involved the 139 matched students and included PISA (Citation2018) as the prior achievement variable, was able to detect more teacher factors associated with students’ progress; this suggests that the longitudinal aspect, namely following up students, matters. This finding is discussed in the next section.

Finally, three versions of model 3 were developed. In model 3a, all teacher factors of the DMEE were added to model 1 whereas in model 3b only the factors associated with the dialogic education framework were added. In the final version of model 3 all factors associated with both frameworks were added to model 1. This model was not only found to fit better than any other two versions of model 3 (i.e., 3a and 3b) but was also able to explain more variance than any other model (i.e., 27.3% of total variance).

6.2. Effect sizes of the teaching quality factors of the DMEE and dialogue variables on students’ final achievement in literacy

To explore further the impact of the teaching quality factors of the DMEE and dialogue variables with student final achievement in literacy, we estimated the effect sizes of the relevant statistically significant variables added separately in model 1. Specifically, we converted the fixed effect obtained from each multilevel analysis to standardised effects or “Cohen’s d” by following the approach proposed by Elliot and Sammons (Citation2004). It was found that the effect sizes of the teaching quality factors of the DMEE and dialogue variables in analysis 2, were bigger than the relevant effect sizes in analysis 1 (see ). For instance, the effect size of time management in case of analysis 1 was d = 0.34, whereas in case of analysis 2 was d = 0.57. The factor concerned with the Teacher inviting students to reason in case of analysis 1 was d = 0.27, whereas in case of analysis 2 was d = 0.32. As can be observed in , the factors concerned with Classroom as a learning environment, the Orientation, Teacher challenging and Teacher inviting students to reason on others’ ideas, had not statistically significant effect on students’ achievement in literacy. Therefore, positive results about the impact of teacher factors emerged from analysis 2 which included the 139 students who were matched between the PISA (Citation2018) database and our 2019 database.

Table 2. Effect sizes of the teaching quality factors of the DMEE and dialogue variables on student achievement in literacy.

7. Discussion

The present paper aims to discuss the main methodological limitations that arise in research when ILSA studies are used for secondary analysis purposes. Specifically, it examines the potential of using PISA as a prior achievement measure in a follow-up study to identify associations between teaching quality and student achievement. We examine this in the context of teaching effectiveness by focusing on two theoretical frameworks: (a) educational effectiveness research and (b) research on educational dialogue.

7.1. A comprehensive approach to teaching effectiveness research

As many previous studies in the field of EER have shown, the empty models of both analyses reveal a significant classroom effect on achievement (e.g., Kyriakides et al., Citation2020a; Kyriakides & Creemers, Citation2008; Panayiotou et al., Citation2014). This finding suggests that teachers matter and their effect on promoting learning outcomes of upper secondary students has been identified. An additional contribution to the field, however, relates to the age group of the students participating in it. Most of EER studies focus on primary education and, to our knowledge, no studies to date focus on upper secondary education. To further establish the validity of generic factors of teaching effectiveness, it is crucial to find evidence that their generic nature is maintained throughout education. Furthermore, the use of a comprehensive model contributes further to the field of teaching effectiveness. Recent developments in this field call for the study of a combination of theoretical models of teaching effectiveness, so that the ways in which different models complement each other can be identified (Charalambous & Praetorius, Citation2020; Lindorff & Sammons, Citation2018). In this study, we combine the teacher effectiveness factors of the DMEE with the dialogic education theoretical framework and effects are identified from factors of both models. Specifically, we found that model 3 which combines factors from both theoretical frameworks of teaching effectiveness explained more student achievement variance than any other model. This finding suggests that a more comprehensive approach should be considered in the study of teaching effectiveness. This study also examined the importance of ILSA studies. These are explained in the following section.

7.2. Optimal effects of ILSA studies

An important finding from this study is that, when ILSA studies are used as prior achievement measures, they can help us to detect additional and larger effects on student achievement. In particular, when comparing the effects of Analysis 1 and 2 in the present study, we found that in Analysis 2 additional, as well as larger, effects from the two frameworks under investigation were identified. Even though the sample was smaller in analysis 2 than analysis 1, it was easier to detect the effect of teaching factors, which shows that teaching factors are more important for explaining variation in student “progress”, rather than variation in final outcomes. This is despite the smaller sample in Analysis 2, which has certainly reduced statistical power. Study design, therefore, seems to have played a more important role in detecting effects of teaching factors, than the relatively smaller statistical power. This finding calls for more studies comparing analyses with and without using an ILSA study as a prior achievement measure.

There are two possible explanations for these more optimal effects. The first relates to the larger time interval between the first and the second measurement, achieved when the PISA study was utilised as a baseline measure. As PISA takes place at the end of a school year, the follow-up study design offered us additional interval time in comparison to previous studies that plan for a pre-test and post-test measure during the same school year (Kyriakides et al., Citation2020b). A second explanation for the optimal effects relates to the educational level that the study was conducted in. The great majority of studies that utilise the DMEE framework or the educational dialogue framework concern younger age groups and especially primary education, possibly because of easier access and more flexible schedules of primary school teachers (e.g., Higham, Brindley & van de Pol, Citation2014). To our knowledge only three studies used the DMEE in secondary education to date in order to examine whether teachers exhibit the same teaching skills when teaching mathematics in different classrooms (Kokkinou & Kyriakides, Citation2022). The intellectual maturity of older students may allow secondary school teachers to use a wider range of practices related to effective teaching, as well as to initiate more sophisticated dialogues.

While further investigation is needed to shed more light in relation to these explanations, the findings of the present study have important implications for policy. They suggest that following up ILSA studies could benefit areas of education that require interventions. In addition, these designs can be utilised for the evaluation of reforms in sectors of education that seem to lack in relation to those of other countries. More studies are, therefore, needed in regard to policy implications in other areas beyond quality of teaching.

The findings have additional implications for the growing number of studies that are based on secondary analysis of ILSA data in the field of effectiveness. Although these studies secure large samples, they do not have a prior achievement measure (Analysis 2 in this paper). If these secondary analyses are conducted in countries where no national tests take place, then the design investigated in this paper can be used with ILSA data acting as a prior achievement measure to a follow-up study. ILSA studies have not been designed for the purposes of effectiveness studies. Nevertheless, they can be an important resource in studying effectiveness by focusing not only on short-term effects of teachers and schools, but long-term effects as well.

Acknowledgments

We are grateful to the schools, teachers and students who welcomed us in their classrooms and participated in the study. We also thank the team of experts of Modern Greek in secondary education for their consultation on the development of the literacy test.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the Cyprus Research & Innovation Foundation [COMPLEMENTARY/0916/0017]

Notes on contributors

Maria Vrikki

Maria Vrikki is a postdoctoral researcher at the Department of Education of the University of Cyprus. Her research focuses on forms of dialogue that are optimal for learning. She studies this in the classroom context, as well as the context of collaborative teacher learning. Dr Vrikki has published widely in peer-reviewed journals and edited books, she has presented her work at international conferences since 2013, and she is listed as a Collaborator of the “Cambridge Educational Dialogue Research (CEDiR)” group at the University of Cambridge.

Leonidas Kyriakides

Leonidas Kyriakides is Professor of Educational Research and Evaluation at the Department of Education of the University of Cyprus. His main research interests are in the area of school effectiveness and school improvement and especially in modelling the dynamic nature of educational effectiveness and in using research to promote quality and equity in education. Leonidas acted as chair of the EARLI SIG on Educational Effectiveness and as chair of the AERA SIG on School Effectiveness and Improvement.

Andria Dimosthenous

Andria Dimosthenous was a postdoctoral researcher at the Department of Education at the University of Cyprus. Now, she is Assistant Professor of Educational Evaluation at the Department of Preschool Education at the University of Crete. She has participated in several international projects on educational effectiveness and school improvement. Her research interests involve parameters such as the home learning environment that affect student achievement gains.

Notes

1 Students in this year group are approximately 16 years old. This is the year before the last of secondary education in Cyprus.

Unknown widget #5d0ef076-e0a7-421c-8315-2b007028953f

of type scholix-links

References

  • Alexander, R. (2018). Developing dialogic teaching: Genesis, process, trial. Research Papers in Education, 33(5), 561–598. https://doi.org/10.1080/02671522.2018.1481140
  • Andrich, D. (1988). A general form of Rasch's extended logistic model for partial credit scoring. Applied Measurement in Education, 1(4), 363–378. https://doi.org/10.1207/s15324818ame0104_7
  • Azigwe, J. B. (2016). Using comparative international studies for modeling educational effectiveness: A secondary analysis of PISA-2009 study. Journal of Education and Practice, 7(18), 199–209.
  • Baumert, J., & Demmrich, A. (2001). Test motivation in the assessment of student skills: The effects of incentives on motivation and performance. European Journal of Psychology of Education, 16(3), 441–462. https://doi.org/10.1007/BF03173192
  • Caro, D. H. (2011). Parent-child communication and academic performance: Associations at the within- and between-country level. Journal for Educational Research Online, 3(2), 15–37.
  • Caro, D. H., Kyriakides, L., & Televantou, I. (2018). Addressing omitted prior achievement bias in international assessments: An applied example using PIRLS-NPD matched data. Assessment in Education: Principles, Policy & Practice, 25(1), 5–27. https://doi.org/10.1080/0969594X.2017.1353950
  • Case, R. (1993). Theories of learning and theories of development. Educational Psychologist, 28(3), 219–233. https://doi.org/10.1207/s15326985ep2803_3
  • Cazden, C. B. (1986). Classroom discourse. In M. C. Wittrock (Ed.), Handbook of research on teaching (pp. 432–463). Macmillan.
  • Charalambous, C. Y., & Praetorius, A. K. (2020). Creating a forum for researching teaching and its quality more synergistically. Studies in Educational Evaluation, 67, 100894. https://doi.org/10.1016/j.stueduc.2020.100894
  • Christoforidou, M., & Kyriakides, L. (2021). Developing teacher assessment skills: The impact of the dynamic approach to teacher professional development. Studies in Educational Evaluation, 70, Article 101051. https://doi.org/10.1016/j.stueduc.2021.101051
  • Cordero, J. M., & Gil-Izquierdo, M. (2018). The effect of teaching strategies on student achievement: An analysis using TALIS-PISA-link. Journal of Policy Modeling, 40(6), 1313–1331. https://doi.org/10.1016/j.jpolmod.2018.04.003
  • Creemers, B. P. M., & Kyriakides, L. (2008). The dynamics of educational effectiveness: A contribution to policy, practice and theory in contemporary schools. Routledge.
  • Creemers, B. P. M., & Kyriakides, L. (2012). Improving quality in education: Dynamic approaches to school improvement. Routledge.
  • Creemers, B. P. M., Kyriakides, L., & Sammons, P. (2010). Methodological advances in educational effectiveness research. Routledge.
  • Creemers, B. P. M., & Reezigt, G. J. (1996). School level conditions affecting the effectiveness of instruction. School Effectiveness and School Improvement, 7(3), 197–228. https://doi.org/10.1080/0924345960070301
  • De Corte, E. (2000). Marrying theory building and the improvement of school practice: A permanent challenge for instructional psychology. Learning and Instruction, 10(3), 249–266. https://doi.org/10.1016/S0959-4752(99)00029-8
  • Delandshere, G. (2002). Assessment as inquiry. Teachers College Record, 104(7), 1461–1484. https://doi.org/10.1111/1467-9620.00210
  • den Brok, P., Brekelmans, M., & Wubbels, T. (2004). Interpersonal teacher behaviour and student outcomes. School Effectiveness and School Improvement, 15(3-4), 407–442. https://doi.org/10.1080/09243450512331383262
  • Elliot, K., & Sammons, P. (2004). Exploring the use of effect sizes to evaluate the impact of different influences on child outcomes: Possibilities and limitations. In K. Elliot, & I. Schagen (Eds.), But what does it mean? The use of effect sizes in educational research (pp. 6–24). NFER.
  • Gustafsson, J.-E. (2013). Causal inference in educational effectiveness research: A comparison of three methods to investigate effects of homework on student achievement. School Effectiveness and School Improvement, 24(3), 275–295. https://doi.org/10.1080/09243453.2013.806334
  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Kluwer.
  • Hennessy, S., Rojas-Drummond, S., Higham, R., Márquez, A. M., Maine, F., Ríos, R. M., García-Carrión, R., Torreblanca, O., & Barrera, M. J. (2016). Developing a coding scheme for analysing classroom dialogue across educational contexts. Learning, Culture and Social Interaction, 9, 16–44. https://doi.org/10.1016/j.lcsi.2015.12.001
  • Higham, R. J. E., Brindley, S., & Van de Pol, J. (2014). Shifting the primary focus: Assessing the case for dialogic education in secondary classrooms. Language and Education, 28(1), 86–99. https://doi.org/10.1080/09500782.2013.771655
  • Howe, C., Hennessy, S., Mercer, N., Vrikki, M., & Wheatley, L. (2019). Teacher-student dialogue during classroom teaching: Does it really impact upon student outcomes? Journal of the Learning Sciences, 28(4), 462–512. https://doi.org/10.1080/10508406.2019.1573730
  • Kaplan, D., & McCarty, A. R. (2013). Data fusion with international large scale assessments: A case study using the OECD PISA and TALIS surveys. Large-Scale Assessments in Education, 1(1), 1–26. https://doi.org/10.1186/2196-0739-1-6
  • Khine, M. S., Fraser, B. J., & Afari, E. (2020). Structural relationships between learning environments and students’ non-cognitive outcomes: Secondary analysis of PISA data. Learning Environments Research, 23(3), 395–412. https://doi.org/10.1007/s10984-020-09313-2
  • Kokkinou, E., & Kyriakides, L. (2022). Investigating differential teacher effectiveness: Searching for the impact of classroom context factors. School Effectiveness and School Improvement, 33(3), 403–430. https://doi.org/10.1080/09243453.2022.2030762
  • Kraiger, K., Ford, J. K., & Salas, E. (1993). Application of cognitive, skill-based and affective theories of learning outcomes to new methods of training evaluation. Journal of Applied Psychology, 78(2), 311–328. https://doi.org/10.1037/0021-9010.78.2.311
  • Krasne, S., Wimmers, P. F., Relan, A., & Drake, T. A. (2006). Differential effects of two types of formative assessment in predicting performance of first-year medical students. Advances in Health Sciences Education, 11(2), 155–171. https://doi.org/10.1007/s10459-005-5290-9
  • Kyriakides, L. (2016). A synthesis of studies using PISA data – Implications for research, policy and practice. Presented at PISA Seminar 2016, Oxford University Centre for Educational Assessment.
  • Kyriakides, L., Anthimou, M., & Panayiotou, M. (2020a). Searching for the impact of teacher beheavior on promoting students’ cognitive and metacognitive skills. Studies in Educational Evaluation, 64, 100810. https://doi.org/10.1016/j.stueduc.2019.100810
  • Kyriakides, L., Charalambous, Y. C., & Charalambous, E. (2022). Using ILSAs to promote quality and equity in education: The contribution of the dynamic model of educational effectiveness. In T. Nilsen, A. Stencel-Piatak, & J.-E. Gustafsson (Eds.), International handbook of comparative large-scale studies in education: Perspectives, methods and findings (pp. 253–276). Springer.
  • Kyriakides, L., & Creemers, B. P. M. (2008). Using a multidimensional approach to measure the impact of classroom level factors upon student achievement: A study testing the validity of the dynamic model. School Effectiveness and School Improvement, 19(2), 183–205. https://doi.org/10.1080/09243450802047873
  • Kyriakides, L., Creemers, B. P. M., Panayiotou, A., & Charalambous, E. (2020b). Quality and equity in education: Revisiting theory and research on educational effectiveness and improvement. Routledge.
  • Lafontaine, D., Baye, A., Vieluf, S., & Monseur, C. (2015). Equity in opportunity-to-learn and achievement in reading: A secondary analysis of PISA 2009 data. Studies in Educational Evaluation, 47, 1–11. https://doi.org/10.1016/j.stueduc.2015.05.001
  • Le Donné, N., Fraser, P., & Bousquet, G. (2016). Teaching strategies for instructional quality: Insights from the TALIS PISA link data. OECD Education Working Papers, 148. OECD Publishing. https://doi.org/10.1787/5jln1hlsr0lr-en
  • Lindorff, A., & Sammons, P. (2018). Going beyond structured observations: Looking at classroom practice through a mixed method lens. ZDM Mathematics Education, 50(3), 521–534. https://doi.org/10.1007/s11858-018-0915-7
  • Littleton, K., & Mercer, N. (2013). Interthinking: Putting talk to work. Routledge.
  • Liu, J., & Jiang, Z. (2018). The synergy theory of economic growth. In J. Liu & Z. Jiang (Eds.), The synergy theory on economic growth: Comparative study between China and developed countries (pp. 57–90). Springer. https://doi.org/10.1007/978-981-13-1885-6. https://link.springer.com/book/101.1007/978-981-13-1885-6
  • Mercer, N., Wegerif, R., & Major, L. (2020). The Routledge international handbook of research on dialogue education. Routledge.
  • Michaels, S., & O’Connor, C. (2015). Conceptualizing talk moves as tools: Professional development approaches for academically productive discussion. In L. B. Resnick, C. Asterhan, & S. N. Clarke (Eds.), Socializing intelligence through talk and dialogue (pp. 347–361). American Educational Research Association.
  • Muhonen, H., Pakarinen, E., Poikkeus, A.-M., Lerkkanen, M.-K., & Rasku-Puttonen, H. (2018). Quality of educational dialogue and association with students’ academic performance. Learning and Instruction, 55, 67–79. https://doi.org/10.1016/j.learninstruc.2017.09.007
  • Mullis, I. V. S., Martin, M. O., Foy, P., & Drucker, K. T. (2012). PIRLS 2011 international results in reading. TIMSS & PIRLS International Study Center, Boston College.
  • Nystrand, M., Wu, L. L., Gamoran, A., Zeiser, S., & Long, D. A. (2003). Questions in time: Investigating the structure and dynamics of unfolding classroom discourse. Discource Processes, 35(2), 135–198. https://doi.org/10.1207/S15326950DP3502_3
  • O’Dwyer, L. M., Wang, Y., & Shields, K. A. (2015). Teaching for conceptual understanding: A cross-national comparison of the relationship between teachers’ instructional practices and student achievement in mathematics. Large-Scale Assessments in Education, 3(1), 1–30. https://doi.org/10.1186/s40536-014-0011-6
  • Panayiotou, A., Kyriakides, L., Creemers, B. P. M., McMahon, L., Vanlaar, G., Pfeifer, M., Rekalidou, G., & Bren, M. (2014). Teacher behaviour and student outcomes: Results of a European study. Educational Assessment, Evaluation and Accountability, 26, 73–93. https://doi.org/10.1007/s11092-013-9182-x
  • Paris, S. G., & Paris, A. H. (2001). Classroom applications of research on self-regulated learning. Educational Psychologist, 36(2), 89–101. https://doi.org/10.1207/S15326985EP3602_4
  • PISA Technical Report. (2018). https://www.oecd.org/pisa/data/pisa2018technicalreport/
  • Rosenshine, B. (1983). Teaching functions in instructional programs. The Elementary School Journal, 83(4), 335–351. https://doi.org/10.1086/461321
  • Rutkowski, L., Gonzales, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39(2), 142–151. https://doi.org/10.3102/0013189X10363170
  • Scheerens, J. (2013). The use of theory in school effectiveness research revisited. School Effectiveness and School Improvement, 24(1), 1–38. https://doi.org/10.1080/09243453.2012.691100
  • Sinclair, J., & Coulthard, M. (1975). Towards an analysis of discourse: The English used by teachers and pupils. Oxford University Press.
  • Song, S., Perry, L. B., & McConney, A. (2014). Explaining the achievement gap between indigenous and non-indigenous students: An analysis of PISA 2009 results for Australia and New Zealand. Educational Research and Evaluation, 20(3), 178–198. https://doi.org/10.1080/13803611.2014.892432
  • Stenmark, J. K. (1992). Mathematics assessment: myths, models, good questions, and practical suggestions. Reston: NCTM.
  • Vrikki, M., & Evagorou, M. (2023). An analysis of teacher questioning practices during dialogic lessons. International Journal of Educational Research, 117, 102–107. https://doi.org/10.1016/j.ijer.2022.102107
  • Vrikki, M., & Kyriakides, L. (2018, October). The development and validation of an observation tool for classroom dialogue. Paper presented at the EARLI SIG 20 & 26 Conference, Jerusalem.
  • Vrikki, M., Kyriakides, L., & Anastasou, M. (accepted). Searching for the impact of teaching quality and dialogic behaviour on student learning outcomes. Journal for the Study of Education and Development.
  • Vrikki, M., Wheatley, L., Howe, C., Hennessy, S., & Mercer, N. (2019). Dialogic practices in primary school classrooms. Language & Education, 33(1), 85–100. https://doi.org/10.1080/09500782.2018.1509988
  • Wells, G. (1999). Dialogic inquiry: Towards a sociocultural practice and theory of education. Cambridge University Press.
  • Wilks, R. (1996). Classroom management in primary schools: A review of the literature. Behaviour Change, 13(1), 20–32. https://doi.org/10.1017/S0813483900003922
  • Wright, B. D. (1985). Additivity in psychological measurement. In E. E. Roskam (Ed.), Measurement and personality assessment (pp. 101–112). Elsevier.