Supported Formats
Smidge accepts a wide range of source formats. You can combine multiple formats in a single generation — for example, a PDF guide plus a YouTube tutorial plus a markdown reference doc.
Format Reference
| Format | Extensions | Library | Max Size | Notes |
|---|---|---|---|---|
.pdf | pdf-parse | 50 MB | Text-based PDFs work best. Scanned documents may have reduced quality. | |
| YouTube | URL | youtube-transcript | No limit | Paste any YouTube URL. Auto-extracts transcript. Videos without captions are not supported. |
| Audio | .mp3 .wav .m4a .ogg .webm | OpenAI Whisper | 25 MB | Transcribed to text via Whisper API. English works best; other languages are supported but quality varies. |
| Word | .docx | mammoth | 25 MB | Extracts text and basic formatting. Images are stripped. |
| PowerPoint | .pptx | pptx-parser | 25 MB | Extracts slide text and speaker notes. Layout/images are not preserved. |
| Spreadsheet | .xlsx .csv | xlsx / csv-parse | 10 MB | Rows converted to markdown tables. Large datasets are truncated to first 500 rows. |
| Plain Text | .txt .md | Native | 5 MB | Passed through directly. Markdown formatting is preserved. |
| HTML | .html | cheerio | 5 MB | Auto-cleaned to extract article content. Navigation, ads, and scripts are removed. |
| Subtitles | .srt .vtt | Native | 5 MB | Timestamps are stripped; text lines are joined into paragraphs. |
| Web URLs | URL | Scraper + cheerio | N/A | Paste any URL. Content is fetched, cleaned, and extracted. JavaScript-rendered pages may not extract fully. |
Tips for Best Results
Source Quality Matters
The quality of your generated skill is directly proportional to the quality of your source materials. Clean, well-structured documents produce better skills than noisy or poorly formatted content.
Combine Complementary Sources
Combining a reference doc (PDF or markdown) with a practical tutorial (YouTube or audio) often produces the most useful skills. The reference provides structure while the tutorial provides real-world examples.
Size Considerations
Larger sources are not always better. If you have a 200-page PDF, consider extracting the most relevant chapters. The AI pipeline works best with focused, topical content rather than exhaustive reference material.