AmSys - Programmdarstellung

28.8.2009 - 308

Free paper session: Acoustic and perceptual voice analysis II

Elena Kramer 1 , Rainer Schönweiler 1 , Roland Linder 2

1 University of Luebeck, Phoniatrics and Pedaudiology, Luebeck

2 University of Luebeck, Faculty of Science, Luebeck

This paper reports statistics on the fundamental frequency distribution in 145 dysphonic voices and 5 normal subjects with special emphasis on the occurence of low frequency component in connected speech. We used kernel density estimates with Epanechnikov kernel and optimal bandwidth for mode detection. Density estimates of the F0-distribution for 150 voices showed in 48% of cases a bimodal distribution, which was understood to be due to a notable amount of creaky voice and subharmonic frequencies. The distribution characteristics of F0 in dysphonic voices have been used to investigate their effect on roughness judgements. We demonstrate the limitations of describing dysphonic voices in terms of mean, median and frequency range values in long-term F0 measurements and suggest that modal analysis would offer an additional dimension in describing dysphonic voices.

Can Acoustic Analysis Differentiate Between Clear and not Clear Voice Quality in Speech Pathologists?

Samantha Warhurst 1 , Catherine Madill 1 , Patricia McCabe 1 , Robert Heard 1

1 University of Sydney, Speech Pathology, Sydney

Question: Speech Pathologists (SPs) are professional voice users (Titze, 1997) who require a clear voice to communicate effectively (Rogerson & Dodd, 2004). The vocal control of a SP also has clinical significance as SPs are required to manipulate voice quality and power when modeling evidence-based voice therapy techniques such as LSVT (Fox, 2002), Voicecraft (Bagnall & McCullough, 2006) and Resonant Voice Therapy (Verdolini et al, 1995). In Australia, student SPs receive training in voice therapy techniques during their degree. There is little evidence of the vocal skill-level of graduating speech pathology students.

No acoustic norms currently exist to differentiate highly-skilled and unskilled voice users. Therefore a method of differentiating vocally skilled populations such as SPs from the broader population is required. It was therefore of interest to investigate:
1) How many graduating student SPs can produce a clear vocal quality that is appropriate for modeling in voice therapy?
2) Can acoustic measures differentiate between perceptually clear and not clear voices?

Method: Female graduating student SPs (n=35) were asked to produce a clear voice on a long vowel, /a/. A perceptual judgement of vocal clarity was made by two experienced listeners. Subjects were grouped into a "clear" group (20%) and "not clear" group (49%) based on this judgement. The listeners disagreed on 31% of subjects. Acoustic analysis was conducted on the long /a/ vowel for the "clear" and "not clear" samples. All voice signals were typed. Type 2 samples were excluded due to the unreliability of acoustic results for these signals (Titze, 1995). Means for jitter, shimmer, NHR (dB) and average fundamental frequency were compared between the perceptually "clear" and "not clear" groups using group t-tests to compare means.

Results: Eighty-six percent of subjects produced Type 1 signals and for 96% of these subjects NHR was within normal limits (Zhang & Jiang, 2008). Despite this, only 20% (7/35) of all students produced a perceptually clear voice quality on demand and t-tests showed that means for jitter (p=0.032) and NHR (p=0.041) were significantly different between the two groups. Mean differences exceeded 1SD for both significant measures. Jitter was higher and NHR (dB) was lower in the "not clear" group.

Conclusions: Twenty percent of graduating students were able to produce a perceptually clear voice as agreed by two experienced listeners. Acoustic analysis differentiated between perceptually clear and not clear voices. The study highlights a need for normative data for populations with high level vocal skills (e.g. speech pathologists, actors and singers).

Comparison of automated and perceptual categorization of normal and pathological voices

Virgilijus Uloza 1 , Antanas Verikas 2,3 , Marija Bacauskiene 2 , Adas Gelzinis 2 , Ruta Pribuisiene 1 , Marius Kaseta 1

1 Kaunas University of Medicine, Otolaryngology, Kaunas

2 Kaunas University of Technology, Department of Electrical & Control Instrumentation, Kaunas

3 Halmstad University, Intelligent Systems Laboratory, Halmstad

Objective. The aim of the present study was to evaluate the accuracy of the elaborated automated voice categorization system when classifying voice signal samples into the healthy and pathological classes and to compare it with the classification accuracy attained by human experts.

Material and methods. We investigated the effectiveness of ten different feature sets in the classification of voice recordings of the sustained phonation of the vowel sound /a/ into the healthy and two pathological voice classes, and proposed a new approach to building a sequential committee of support vector machines for the classification. During the genetic search, we determined the optimal values of hyper-parameters of the committee and the feature sets providing the best performance.

Results. A considerable improvement in the classification accuracy was obtained from the committee if compared to the single feature type-based classifiers. In the experimental investigations performed using 444 voice recordings of the sustained phonation of the vowel sound /a/ coming from 148 subjects, three recordings from each subject, we obtained correct classification rate (CCR) of over 92% when classifying into the healthy-pathological voice classes, and over 90% - when classifying into three classes (healthy voice and two nodular or diffuse lesion voice classes). The CCR obtained from human experts was about 74% and 60%, respectively.

Conclusion. When operating on the same conditions, the automated voice discrimination technique was considerably more definite than the human experts.

This study was supported by COST Action 2103 “Advanced Voice Function Assessment”

Acoustic analysis, Maximum Phonation Time, and VHI for assessment of thyroplasty for vocal fold paralysis. A prospectively study of twenty patients.

Ågot Grøntved 1 , Christian Faber 1 , John Jakobsen 1 , Bodel Krog Rasmussen 2

1 Odense Universety Hospital, Oto-rhino-laryngology, Odense

2 Center of rehabilitation and Special Counseling Speech Department, , Odense

Question:

Thyroplasty with silicone rubber implantation is a surgical procedure in treating patients with vocal fold paralysis. The aim of the study was to evaluate the outcome of the operation, and to monitor which of the analysis could possibly be most beneficial.

Material and methods:

Twenty consecutive patients were enrolled in the study.

To assess the treatment, we made a videostroboscopic evaluation and Maximal Phonation Time measurement. We recorded a Phonetogram to evaluate the capacity and the intensity of the voice, and we recorded an analysis of the voice quality in the Multi-Dimensional Voice Program. The patients were answering the Voice Handicap Index (VHI).

Results:

The capacity and intensity of the voice were statistically significantly increased with an improvement of the Highest Intensity of 13 dB. The capacity of the voice was increased more than 2.5 times. The voice quality was improved significantly measuring the jitter%, shimmer% and Voice Turbulence Index. The Maximum Phonation Time was changed from a median value of 3 seconds to 10 seconds. The VHI was decreased with 40, from mean preoperatively value of 82.

90 % of the patients were satisfied with the outcome of the operation.

Conclusion:

Besides videostroboscopy a Phonetogram is the most important analysis because it is a quantitative measure of the voice capacity and voice intensity, which are the major problems that patients with vocal fold paralysis suffer from.

All the analysis together are important tools when guiding patients before they decide to have a thyroplasty.

5	How does the hydration level affect the voice? Maria Winberg Andersen 1 , Ågot Grøntved 2 , Christian Faber 2 1 CRS (center for rehabilitering og specialrådgivning), Tale-høre instituttet, Odense 2 Odense University Hospital, oto-rhino-laryngo , Odense

pan european voice conference 2009