MarkItDown

The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)

It presently supports:

  • PDF (.pdf)

  • PowerPoint (.pptx)

  • Word (.docx)

  • Excel (.xlsx)

  • Images (EXIF metadata, and OCR)

  • Audio (EXIF metadata, and speech transcription)

  • HTML (special handling of Wikipedia, etc.)

  • Various other text-based formats (csv, json, xml, etc.)

Last updated