Media Ingestion
Researchers can upload recorded audio and video materials from mobile devices into a unified workflow.
- Audio and video upload
- Built for corpus organization

CorpusSpark is a multimodal transcription and annotation platform built on AI technology, currently focused on reducing the cost and time of corpus processing. Researchers can upload recorded audio and video materials from mobile devices, and the platform combines speaker recognition with automatic transcription to complete the first round of corpus organization.
The platform supports synchronized viewing of transcript text, audio/video, and aligned timelines. After manual review and annotation, the results can be exported in formats compatible with the TALKBANK clan workflow for subsequent analysis, archiving, and research use.
CorpusSpark currently supports iPhone, iPad, and Mac with Apple silicon. It is designed to be easy to use for both experienced researchers and first-time users. More feature details and updates are available through the CorpusSpark official account.

Follow the official account for product updates and feature notes.
Additional notes on automatic processing, supported devices, and research use.
Its machine transcription capability can quickly handle labor-intensive preprocessing tasks such as content transcription, speaker labeling, and timeline alignment. Researchers can then continue with review, label enrichment, and more detailed transcription work aligned with their research goals.
CorpusSpark starts from the needs of linguistics research while balancing efficiency, accuracy, accessibility, and cost. It is also intended to support high-quality corpus co-creation, research data verification, and downstream applications.
The CorpusSpark platform organizes its core features around the multimodal corpus processing workflow.
Researchers can upload recorded audio and video materials from mobile devices into a unified workflow.
Automatic transcription handles repetitive preprocessing work and quickly produces an initial draft.
Speaker recognition helps distinguish speakers and produce a clearer conversational structure.
Text, audio/video, and aligned timelines can be reviewed together for easier replay and verification.
Users can revise machine results, update speaker labels, and add more detailed annotations.
After review, results can be exported in TALKBANK/clan-compatible formats for analysis and archiving.
A clear process from data ingestion and automatic processing to human review and structured export.
Audio, video, and text are brought into one workflow.
Start from a unified pipeline.
Transcription, analysis, and preprocessing are executed.
Improve front-end efficiency.
Annotation review and quality checks refine the results.
Keep outputs consistent.
Results are exported in standardized formats.
Ready for training and integration.