ReTeRom

Description of COBILIRO

Name: Multi-level Annotated Bimodal Corpus for Romanian (COBILIRO).

The overall objective of this project is to create a thesaurus with audio and textual resources, annotated at different acoustic and linguistic levels, which is to become the most important reference for this type of resource for the Romanian language.

Applicability: The audio data and annotated text will be the foundation for the development of human machine interfaces technologies in natural language in Romanian: see
Project 2 (TEPROLIN), Project 3 (TADARAV) and Project 4 (SINTERO).

Activities will consider: careful inventory of existing bimodal resources at project’s partners; harmonization of representation formats, annotations and metadata; designing, building and testing the infrastructure hosting the resources; developing processing and access tools for the consortium; augmenting the voice-text corpus, completing it with metadata, alignments and annotations; conducting statistical studies on the corpus, exploitation for research and production, and wide dissemination of the bimodal corpus, valorization and use of type A1, A2 and B checks.

1.1. State-of-the-art study on bimodal corpus design

1.2. Inventory of Romanian language data collections available at partners or in third parties coalitions and of their storage formats.

1.3. Functional and architectural design of the infrastructure that will host the consortium's resources and tools for processing and access and the realization of a prototype

1.4. Dissemination.

2.1. Creating the common infrastructure for storing bimodal resources and for processing and searching tools

2.2. Designing solutions for the harmonization of different representations of existing collections (annotations and metadata)

2.3. Creating format convertors for the harmonization of different representations to a standard representation agreed upon within the consortium

2.4. Armonizarea colecțiilor existente Harmonization of existing collections

2.5. Dissemination

3.1. Increasing the size of the oral corpus with new recordings that duplicate texts from the CoRoLa corpus

3.2. Increasing the size of the bimodal corpus: metadata filling-in, alignment with the help of the algorithms developed in projects P2, P3 and P4 and manual and semiautomatic annotations of the bimodal corpus

3.3. Extracting statistics on the bimodal corpus

3.4. Designing applications for exploiting the bimodal corpus and the technologies for written and oral texts processing, created in projects P2, P3 and P4

3.5. Management and dissemination.

4.1. Other applications for the exploitation of the bimodal corpus and of the speech and text processing technologies developed in ReTeRom

4.2. Dissemination of the bimodal corpus

Description of TEPROLIN

Name: Technologies for processing natural language - text (TEPROLIN)

The general objective of this project is the development of a set of advanced technologies for the processing of natural language (text) in Romanian: morphological, syntactic and semantic analysis of texts, with annotation of the text collected in Project 1 (COBILIRO) on different linguistic levels (phoneme, syllable, word, part of speech, etc.).

Applicability:These technologies will be applied in machine word processing and interpretation systems for Romanian, in creating language models for speech recognition interfaces, respectively in text processing for creating speech from text synthesis interfaces.

Activities will include: inventory of integrated technology for the processing of natural language in Romanian, processing and annotation at different linguistic levels of the bimodal corpus generated within COBILIRO (Project 1), evaluation of speech recognition systems (Project 3, TADARAV) and text to speech synthesis systems (Project 4, SINTERO) trained with bimodal corpus in COBILIRO, valorization and use of types A1, A2, B checks.

1.5. Defining the functional and architectural specifications of the integrated and configurable text processing platform

1.6. Defining the software modules and services offered by the project; identifying necessary adaptations for existing NLP modules and new modules needed

1.7. Making the necessary adaptations for the existing NLP modules identified in Activities 1.5 and 1.6

1.8. Creating and validating (possibly with necessary manual corrections) a bimodal corpus lexicon and incorporating it into the existing lexicon

1.9. Dissemination

2.6. Implementation of new modules conforming to the defined functional specifications

2.7. Phonetical transcription of the words from the validated lexicon

2.8. Implementation of the prototype for the integrated and configurable platform; testing, evaluating and validating the prototype

2.9. Processing the textual component of the bimodal corpus collected in project 1. Validation and correction of processing errors

2.10. Dissemination

3.6. Analysis of the errors of the ASR and TTS systems trained in projects 3 and 4 on the annotated and corrected bimodal corpus aggregated in project 1

3.7. Finalizing the development, testing and validation of the integrated and configurable platform for processing texts in Romanian; ready-to-use solution

3.8. Disseminating the results of TEPROLIN

4.3. Testing the dockerised TEPROLIN platform on new corpora

4.4. Dissemination of the dockerised TEPROLIN platform

Description of TADARAV

Name: Technologies for automatic annotation of audio data and for the creation of automatic speech recognition interfaces (TADARAV)

The overall objective of this project is to develop a set of advanced technologies for the automatic phonetic annotation of the voice signal collected in the corpus of Project 1 (COBILIRO), respectively for the creation of automatic speech recognition interfaces in Romanian using the language models generated in Project 2 (TEPROLIN).

Applicability: These technologies will be applied in automated speech recognition systems and in the automatic segmentation and annotation of the required voice signal in the P4 (SINTERO).

The activities will consider: the inventory of the methods of automatic phonetic annotation methods for voice using complementary ASR systems, the design and implementation of methods for filtering and alignment of transcriptional estimates, the development and implementation of confidence score generation algorithms, the delivery of an ASR (and automatic transcription) technology based on confidence scores, valorizing and using types A1, A2, and B checks.

1.10. Study of well-known methods on the use of complementary ASR systems for the automatic generation of annotations

1.11. Study of well-known methods for alignment of approximate transcripts with speech signal

1.12. Study of well-known methods for generating confidence scores for Automatic Speech Recognition (ASR)

1.13. Design and implementation of a basic solution for automatic speech annotation using complementary ASR systems

1.14. Dissemination

2.11. Designing and implementing a basic solution for filtering and aligning the approximate transcriptions with speech signal

2.12. Designing and implementing a basic solution for the generating ASR confidence score

2.13. Enhancing automatic speech recognition solution using complementary ASR systems

2.14. Dissemination

3.9. Analysis of the impact of using complementary ASRs for generating annotations within the context of improving ASR systems

3.10. Improving the solution for filtering and aligning approximate transcription with the speech signal

3.11. Improving the solution for generating confidence scores for ASR

3.12. Analysis of the impact of using approximate transcriptions for retraining ASR systems

3.13. Analysis of the impact of using confidence scores for filtering ASR transcriptions for retraining ASR systems

3.14. Dissemination

4.5. Management and dissemination

Description of SINTERO

Name: Technologies for the realization of human-machine interfaces for text-to-speech synthesis with expressivity (SINTERO)

The overall objective of this project is the development of an advanced technology for the synthesis of high-quality and expressive speech in Romanian, based on the resources collected in Project 1 (COBILIRO) and the automatic annotations generated in Project 2 (TEPROLIN) for text and in Project 3 (TADARAV) for audio data.

Applicability: This technology will be applied to text-to-speech synthesis for Romanian, for generating new synthesized voices, and for adapting some applications dependent on speech style and expressiveness (e.g. TV news, oratory speech, emotional voices).

Activities will consider: inventory of methods for modeling and control of expressivity in text-to-speech synthesis systems, implementation of components for prosody modeling and adaptation of synthesized voices to new speakers, development of new technology for realization of text-to-speech synthesis interfaces with expressivity, valuation and use of type A1, A2 and B checks.

1.15. Identifying prosody patterns; highlighting correlations between text (morphology, syntax) and vocal signal

1.16. Identifying methods for automatic recognision and classification of the expression style in textual data sources

1.17. Analysis of the methods for automatic control and adaptation of the speakers' expressivity in the text-to-speech synthesis systems

1.18. Implementation of the automatic prosody control module

1.19. Dissemination

2.15. Implementing a module for identification of the speech style and expressivity level from text analysis

2.16. Implementing a module for the adaptation of the TTS system to a new speaker

2.17. Implementing a module for transplantation of a speaker’s prosody in the TTS system

2.18. Improving the prosody modelling and control component; software testing and validation/demonstration activities

2.19. Dissemination

3.15. Developing a new technology for adapting the synthetic voice to the style and expressivity of a new speaker

3.16. Developing a new method for quick adaptation of the synthetic voice using atypical audio data

3.17. Integrating a new technology and demonstrating it in the creation of human-computer interfaces for speech synthesis

3.18. Dissemination

4.6. Final evaluation and distribution of Project 4 technologies

4.7. Dissemination

Reports and publications

PHASE I REPORT

Technical-Scientific Report for ReTeRom
Phase I (2018).
1.10

TADARAV
Study of well-known methods on the use of complementary ASR systems for the automatic generation of annotations
1.11

TADARAV
Study of well-known methods for alignment of approximate transcripts with speech signal.
1.12

TADARAV
Study of well-known methods for generating confidence scores for Automatic Speech Recognition (ASR).
1.15

SINTERO
Identifying prosody patterns; highlighting correlations between text (morphology, syntax) and vocal signal.
1.16

SINTERO
Identifying methods for automatic recognision and classification of the expression style in textual data sourcesaudio.
1.17

SINTERO
Analysis of the methods for automatic control and adaptation of the speakers' expressivity in the text-to-speech synthesis systems.
1.18

SINTERO
Implementation of the automatic prosody control module.
1.1

COBILIRO:
State-of-the-art study on bimodal corpus design.
1.2

COBILIRO:
Inventory of Romanian language data collections available at partners or in third parties coalitions and of their storage formats.
1.3

COBILIRO:
Functional and architectural design of the infrastructure that will host the consortium's resources and tools for processing and access and the realization of a prototype.
1.4

DISSEMINATION
Dissemination and participation in technical-scientific events, including in the media.
1.5

TEPROLIN:
Defining the functional and architectural specifications of the integrated and configurable text processing platform.
1.6

TEPROLIN:
Defining the software modules and services offered by the project; identifying necessary adaptations for existing NLP modules and new modules needed.
1.7

TEPROLIN:
Making the necessary adaptations for the existing NLP modules identified in Activities 1.5 and 1.6
1.8

TEPROLIN:
Creating and validating (possibly with necessary manual corrections) a bimodal corpus lexicon and incorporating it into the existing lexicon
1.9

ICIA:
Web page launch.
2.1

COBILIRO:
Creating the common infrastructure for storing bimodal resources and for processing and searching tools.
2.2

COBILIRO:
Designing solutions for the harmonization of different representations of existing collections (annotations and metadata).
2.3

COBILIRO:
Creating format convertors for the harmonization of different representations to a standard representation agreed upon within the consortium.
2.4

COBILIRO:
Harmonization of existing collections.
2.5

COBILIRO:
Dissemination.
2.6

TEPROLIN:
Implementation of new modules conforming to the defined functional specifications.
2.7

TEPROLIN:
Phonetical transcription of the words from the validated lexicon
2.8

TEPROLIN:
Implementation of the prototype for the integrated and configurable platform; testing, evaluating and validating the prototype..
2.11 - 2.14

TADARAV:
Act 2.11 - Designing and implementing a basic solution for filtering and aligning the approximate transcriptions with speech signal.
Act 2.12 - Designing and implementing a basic solution for the generating ASR confidence score
Act 2.13 - Enhancing automatic speech recognition solution using complementary ASR systems
Act 2.14 - Dissemination
2.9 - 2.10

TEPROLIN:
Processing the textual component of the bimodal corpus collected in project 1. Validation and correction of processing errors.
2.15

SINTERO:
Implementing a module for identification of the speech style and expressivity level from text analysis.
2.16

SINTERO:
Implementing a module for the adaptation of the TTS system to a new speaker.
2.17

SINTERO:
Implementing a module for transplantation of a speaker’s prosody in the TTS system.
2.18

SINTERO:
Improving the prosody modelling and control component; software testing and validation/demonstration activities..
2.19

SINTERO:
Dissemination.
PHASE REPORT

Technical-Scientific Report for ReTeRom Phase II (2019).
2018

TEPROLIN:
Disseminations.
2018

Events organized in the
ReTeRom project
2019

TEPROLIN:
Disseminations.
2019

Events organized in the
ReTeRom project
RELATE

TEPROLIN:
Romanian Portal of Language Technologies.
2020

Scientific and technical report
(2018 - september 2020)
3.1

COBILIRO:
Increasing the size of the oral corpus with new recordings that duplicate texts from the CoRoLa corpus
3.2

COBILIRO:
Increasing the size of the bimodal corpus: metadata filling-in, alignment with the help of the algorithms developed in projects P2, P3 and P4 and manual and semiautomatic annotations of the bimodal corpus
3.3

COBILIRO:
Extracting statistics on the bimodal corpus
3.4

COBILIRO:
Designing applications for exploiting the bimodal corpus and the technologies for written and oral texts processing, created in projects P2, P3 and P4
3.5

COBILIRO:
Management and dissemination
3.6

TEPROLIN:
Analysis of the errors of the ASR and TTS systems trained in projects 3 and 4 on the annotated and corrected bimodal corpus aggregated in project 1
3.7

TEPROLIN:
Finalizing the development, testing and validation of the integrated and configurable platform for processing texts in Romanian; ready-to-use solution
3.8

TEPROLIN:
Dissemination
3.9

TADARAV:
Analysis of the impact of using complementary ASRs for generating annotations within the context of improving ASR systems
3.12

TADARAV:
Analysis of the impact of using approximate transcriptions for retraining ASR systems
3.13

TADARAV:
Analysis of the impact of using confidence scores for filtering ASR transcriptions for retraining ASR systems
3.14

TADARAV:
Dissemination
3.15

SINTERO:
Developing a new technology for adapting the synthetic voice to the style and expressivity of a new speaker
3.16

SINTERO:
Developing a new method for quick adaptation of the synthetic voice using atypical audio data
3.17

SINTERO:
Integrating a new technology and demonstrating it in the creation of human-computer interfaces for speech synthesis.
3.18

SINTERO:
Dissemination
2020

Scientific and technical report phase III
Resources and technologies for the development of human-computer interfaces in the Romanian language.
Workshop

Final phase Workshop
4.1.

COBILIRO:
Other applications for the exploitation of the bimodal corpus and of the speech and text processing technologies developed in ReTeRom
4.2.

COBILIRO:
Dissemination of the bimodal corpus
4.3.

TEPROLIN:
Testing the dockerised TEPROLIN platform on new corpora
4.4.

TEPROLIN:
Dissemination of the dockerised TEPROLIN platform
4.6

SINTERO:
Final evaluation and distribution of Project 4 technologies
4.7.

SINTERO:
Dissemination
2021
PHASE IV
REPORT

Description of COBILIRO

Name: Multi-level Annotated Bimodal Corpus for Romanian (COBILIRO).

The overall objective of this project is to create a thesaurus with audio and textual resources, annotated at different acoustic and linguistic levels, which is to become the most important reference for this type of resource for the Romanian language.

Applicability: The audio data and annotated text will be the foundation for the development of human machine interfaces technologies in natural language in Romanian: see Project 2 (TEPROLIN), Project 3 (TADARAV) and Project 4 (SINTERO).

Description of TEPROLIN

Name: Technologies for processing natural language - text (TEPROLIN)

Applicability:These technologies will be applied in machine word processing and interpretation systems for Romanian, in creating language models for speech recognition interfaces, respectively in text processing for creating speech from text synthesis interfaces.

Description of TADARAV

Name: Technologies for automatic annotation of audio data and for the creation of automatic speech recognition interfaces (TADARAV)

Applicability: These technologies will be applied in automated speech recognition systems and in the automatic segmentation and annotation of the required voice signal in the P4 (SINTERO).

Description of SINTERO

Name: Technologies for the realization of human-machine interfaces for text-to-speech synthesis with expressivity (SINTERO)

Applicability: This technology will be applied to text-to-speech synthesis for Romanian, for generating new synthesized voices, and for adapting some applications dependent on speech style and expressiveness (e.g. TV news, oratory speech, emotional voices).

Reports and publications

Project’s team

Acad. Dan TUFIȘ

Dr. Verginica BARBU MITITELU

Dr. Radu ION

Dr. Elena IRIMIA

Eric Curea

prof. Corneliu BURILEANU

prof. Dragoș BURILEANU

dr. Horia CUCU

dr. Dan ONEAȚĂ

Dan Cristea

Anca Bibiri

Daniela Gifu

Mihaela Onofrei

Ionuț Pistol

Andrei Scutelnicu

Diana Trandabat

prof. Mircea GIURGIU

dr. Adriana STAN

Vasile Păiș

Maria Mitrofan

Contact

Applicability: The audio data and annotated text will be the foundation for the development of human machine interfaces technologies in natural language in Romanian: see
Project 2 (TEPROLIN), Project 3 (TADARAV) and Project 4 (SINTERO).