Media Ingestion
Researchers can upload recorded audio and video materials from mobile devices into a unified workflow.
- Audio and video upload
- Built for corpus organization

Designed for multimodal corpus processing and data organization, with support for audio/video analysis, corpus annotation, timeline review, and structured export.

Follow the official account for product updates and feature notes.
CorpusSpark is built for multimodal corpus processing and provides automatic transcription, timeline review, and structured export.
CorpusSpark is a multimodal transcription and annotation platform built on AI technology, currently focused on reducing the cost and time of corpus processing. Researchers can upload recorded audio and video materials from mobile devices, and the platform combines speaker recognition with automatic transcription to complete the first round of corpus organization.
The platform supports synchronized viewing of transcript text, audio/video, and aligned timelines. After manual review and annotation, the results can be exported in formats compatible with the TALKBANK clan workflow for subsequent analysis, archiving, and research use.
CorpusSpark currently supports iPhone, iPad, and Mac with Apple silicon. It is designed to be easy to use for both experienced researchers and first-time users. More feature details and updates are available through the CorpusSpark official account.
Its machine transcription capability can quickly handle labor-intensive preprocessing tasks such as content transcription, speaker labeling, and timeline alignment. Researchers can then continue with review, label enrichment, and more detailed transcription work aligned with their research goals.
CorpusSpark starts from the needs of linguistics research while balancing efficiency, accuracy, accessibility, and cost. It is also intended to support high-quality corpus co-creation, research data verification, and downstream applications.
Core functions and workflow are shown together to explain how the platform handles multimodal corpus processing.
Researchers can upload recorded audio and video materials from mobile devices into a unified workflow.
Automatic transcription handles repetitive preprocessing work and quickly produces an initial draft.
Speaker recognition helps distinguish speakers and produce a clearer conversational structure.
Text, audio/video, and aligned timelines can be reviewed together for easier replay and verification.
Users can revise machine results, update speaker labels, and add more detailed annotations.
After review, results can be exported in TALKBANK/clan-compatible formats for analysis and archiving.
Audio, video, and text are brought into one workflow.
Start from a unified pipeline.
Transcription, analysis, and preprocessing are executed.
Improve front-end efficiency.
Annotation review and quality checks refine the results.
Keep outputs consistent.
Results are exported in standardized formats.
Ready for training and integration.