Observation Survey of Early Literacy Achievement


Rating Summary

Classification Accuracyfull bubble
Reliabilityfull bubble
Validityfull bubble
Disaggregated Reliability and Validity Datafull bubble
Administration & Scoring Time15 - 45 Minutes
Scoring KeyComputer Scored
Benchmarks / NormsYes
Cost Technology, Human Resources, and Accommodations for Special Needs Service and Support Purpose and Other Implementation Information Usage and Reporting

$97.80 - $117.80 for a complete set of assessment materials per teacher. Included are:

  • Observation Survey book that includes rationales; procedures for administering, scoring, and interpreting the tasks; forms; and technical information
  • Two specially designed books to be used with Concepts About Print
  • One copy per teacher of book Where’s Spot, used with Running Records of Text Reading
  • Scott Foresman Testing Packet

There are no additional costs for subsequent years unless additional teachers are administering the Survey.

Computer and Internet access are required for full use of product services in Reading Recovery schools.

Testers will require 4-8 hours of training.  Reading Recovery teachers receive more extensive training.

Training costs are part of Reading Recovery implementation budgets. Reading Recovery sites employ a teacher leader who is trained in the administration, scoring, and interpretation of the Observation Survey. That teacher leader is responsible for training teachers to use the tool for screening, selecting children for intervention, for monitoring progress, and for making Reading Recovery exit decisions.

Because the Observation Survey measures authentic literacy knowledge and is administered individually, accommodations are made based on teacher observations of each child.

P.O. Box 6926
Portsmouth, NH 03802-6926

Phone: 800-225-5800
Website: www.heinemann.com

Field-tested training manuals are included and should provide all implementation information.

Ongoing technical support for Reading Recovery teachers is available via trained leaders.

Professional learning modules including videos for administering and interpreting Running Records of Text Reading are available from the Reading Recovery Council of North America.

The Observation Survey of Early Literacy Achievement comprises six systematic, standard observation tasks that yield a composite and comprehensive assessment of the literacy performance of young learners. The six tasks are: concepts about print (how print encodes information), the reading of continuous text, letter knowledge, reading vocabulary, writing vocabulary, and phonemic awareness and sound-symbol relationships. Children are assessed individually by a specially trained teacher.

Administration is done individually and requires 15-45 minutes per student. Scoring requires an additional 15 or more minutes.

Available scores include: raw, percentile, and stanines. IRT-based scores will be available in future print editions.

The Observation Survey is administered individually to children perceived by their first-grade classroom teachers as the lowest literacy achievers in their classrooms and serves as the screening process to identify the lowest achievers. The tester is a specially trained teacher. For each task, the tester records the raw score and the stanine. In consultation with the school team and the Reading Recovery teacher leader, the children with the lowest stanine scores across the Observation Survey are identified, without regard to cultural or linguistic diversity or disabilities.


Classification Accuracy

Classification Accuracy in Predicting Proficiency Level on the Slosson Oral Reading Test
  n = 1,826
False Positive Rate 0.20
False Negative Rate 0.20
Sensitivity 0.80
Specificity 0.80
Positive Predictive Power 0.60
Negative Predictive Power 0.92
Overall Classification Rate 0.80
AUC (ROC) 0.87
Base Rate 0.26
Cut Points: 425
(At 90% sensitivity, the cut is 407, at 80% sensitivity, 425, and at 70% sensitivity, 435)
At 90% Sensitivity, Specificity equals 0.65
At 80% Sensitivity, Specificity equals 0.80
At 70% Sensitivity, Specificity equals 0.86



Description of study sample:

  • Number of States: 44
  • Size: 1,826
  • Gender:
    • 49.4% Male
    • 50.5% Female 
    • 0.1% Unknown
  • SES: 42.5% Eligible for free or reduced-price lunch
  • Race/Ethnicity:
    • 69.3% White, Non-Hispanic
    • 14.8% Black, Non-Hispanic
    • 8.4% Hispanic
    • 0.5% American Indian/Alaska Native
    • 3.0% Asian/Pacific Islander
    • 4.0% Other
  • Disability classification:
    • 92.8% No Disability
    • 3.2% Speech and Language Impairment
    • 0.7% Developmental Delay
    • 0.4% Specific Learning Disability
    • 2.9% Other Disabilities
  • First Language:
    • 90.6% English
    • 6.5% Spanish
    • 0.3% Chinese
    • 0.3% Vietnamese
    • 0.1% Arabic
    • 0.6% Other Languages
  • Language proficiency status: 8.6% ELL


Cross Validation Study Description of study sample:

  • Number of States: 47
  • Size: 1,594
  • Gender:
    • 48.5% Male
    • 51.4% Female 
    • 0.1% Unknown
  • SES: 48% Eligible for free or reduced-price lunch
  • Race/Ethnicity:
    • 70.2% White, Non-Hispanic
    • 15.4% Black, Non-Hispanic
    • 7.3% Hispanic
    • 0.8% American Indian/Alaska Native
    • 2.7% Asian/Pacific Islander
    • 3.6% Other
  • Disability classification:
    • 93.7% No Disability
    • 2.9% Speech and Language Impairment
    • 0.6% Development Delays
    • 0.4% Specific Learning Disability
    • 2.4% Other Disabilities
  • First Language:
    • 92.1% English
    • 5.1% Spanish
    • 0.4% Chinese
    • 0.1% Vietnamese
    • 0.3% Arabic
    • 2.0% Other Languages
  • Language proficiency status: 7.4% ELL


Type of Reliability Age or Grade n (range) Coefficient SEM
Alpha 1st 7,926 0.87 18.47
Split-half 1st 7,926 0.89 16.99



Type of Validity Age or Grade Test or Criterion n (range) Coefficient Information/Subjects
Predictive 1st Spring 2010 Slosson Oral Reading Test-Revised 878 0.72 A sub-sample of students were administered the Slosson Oral Reading Test-Revised (SORT-R) in fall, mid-year, and spring. The SORT-R technical manual, however, does not provide norms for first-grade students.
Predictive 1st Mid-Year Slosson Oral Reading Test-Revised 789 0.75  
Predictive 1st Spring 2010 Text Reading Level 7,152 0.74 The spring Text Reading Level was used as the criterion indicator for classification accuracy. Administered at end of year.
Predictive 1st Mid-Year Scale Score 7,147 0.83 The mid-year Observation Survey (OS) scale score is the OS composite score administered from late December to early February.
Construct 1st Fall 2009 Slosson Oral Reading Test-Revised 820 0.78 The fall SORT-R was administered within the same time frame as the fall OS.

Content Validity

Letter Identification

Concept Assessed: Letter Knowledge

Content Validity: It is common practice in the early grades for teachers to find out how many letters a child knows and if the child can visually distinguish letters one from another. In this task, all lowercase and uppercase letters (plus print forms of ‘a’ and ‘g’) are assessed. Children can respond with the letter name, a sound the letter makes, or a word beginning with the letter. Because this is a closed knowledge set, the task is most valid during the time a child is acquiring letter knowledge. The content represents what is actually taught in classrooms.

Ohio Word Test

Concept Assessed: Word Knowledge (Reading Vocabulary)

Content Validity: Word knowledge correlates strongly with text reading performance. And teachers of early readers seek ways to document the number of words a child knows and how this developing knowledge changes across time. The content of this task represents typical classroom expectations.

This task is structured to sample words that the children have had some opportunities to learn — words that occur frequently in their texts in school. Word lists (lists of 20 words) are drawn from the Dolch list of high-frequency words. Three equivalent lists are available. Because these lists include very high-frequency words, the task is most valid over the period when a child is acquiring an initial reading vocabulary.

In addition to information about a child’s knowledge of words in isolation and how this knowledge develops over time, the teacher can learn how a child works with words through attempts and self-corrections.

Concepts About Print

Concept Addressed: Print Knowledge
(what children know about the way spoken language is represented in print)

Content Validity: Conventions used for printed language must be learned so the child can attend to the essential visual information on the page. The task assesses a child’s current knowledge of 24 print concepts. Content validity is supported through the use of a specially designed book that allows the child to demonstrate knowledge of print concepts in an authentic setting.

Writing Vocabulary

Concept Addressed: Writing Vocabulary (a child’s personal resource of known words)

Content Validity: In this task, the child writes words he knows for a period of 10 minutes. The teacher may prompt the child in various ways to think of other known words. The score represents the number of words written independently and also serves as a screen on the child’s visual attention to print, sound sequence, motor control, and useful approximations. This task represents classroom instructional expectations.

Hearing and Recording Sounds in Words

Concept Addressed: Phonemic Awareness and Letter/Sound Relationships

Content Validity: The importance of phonemic awareness and representing phonemes with letters or clusters of letters is well documented in the research literature. Teachers can use the score on this dictation task (with a sampling of 37 phonemes) as an indicator of a child’s developing knowledge in this area. This task measures classroom instructional practices.

The teacher dictates a passage (five forms available) and asks the child to say the words slowly and write letters to represent the sounds. Each phoneme recorded in a way that is acceptable in English is counted. Because this is a closed knowledge set, the task is most valid over the period when a child is acquiring this knowledge and before the words in the passage become part of a child’s known writing vocabulary.

Running Records of Text Reading

Concept Addressed: Instructional Reading Level for Reading Continuous Text
(also child’s behaviors while reading real books)

Content Validity: The score on this task represents the highest text level (from texts representing a gradient of difficulty) that a child reads at 90% accuracy or higher. The task is an authentic assessment of the reading of continuous text. The testing packet used for Reading Recovery in the United States has been shown to be a stable measure of reading performance that represents escalating gradients of difficulty.

Evidence from research and classroom practice confirms that text difficulty relates to a reader’s developing competencies. For learning to occur, the difficulty level of reading materials should present challenges from which the child can learn—texts that are not too hard or too easy.

As the child reads the texts, the teacher uses established conventions to record behaviors in order to analyze the child’s reading behaviors.

Disaggregated Reliability, Validity, and Classification Data for Diverse Populations

Disaggregated Reliability, Validity, and Classification Data for Diverse Populations

Disaggregated Classification Accuracy

Classification Accuracy in Predicting Proficiency on Slosson Oral Reading Test
  African American
n = 516
Hispanic or Latino
n = 270
False Positive Rate 0.27 0.31
False Negative Rate 0.12 0.14
Sensitivity 0.88 0.86
Specificity 0.73 0.69
Positive Predictive Power 0.68 0.60
Negative Predictive Power 0.90 0.90
Overall Classification Rate 0.79 0.75
AUC (ROC) 0.86 0.86
Base Rate 0.40 0.35
Cut Points: 425 425
At 90% Sensitivity, Specificity equals 60% 61%
At 80% Sensitivity, Specificity equals 85% 78%
At 70% Sensitivity, Specificity equals 88% 84%

Disaggregated Reliability

Type of Reliability Grade n Coefficient SEM Information/Subjects
Alpha 1st 1,150 0.87 19.22 African American
Split-half 1st 1,150 0.89 17.68 African American
Alpha 1st 859 0.88 18.57 Hispanic / Latino
Split-half 1st 859 0.89 17.78 Hispanic / Latino