PdfWox

Extract text from a PDF

Pull clean plain text out of any PDF in your browser. Text PDFs extract instantly; scanned PDFs go through in-browser OCR.

Files are processed entirely in your browser. Nothing is uploaded to any server.

About this tool

PDF to Text pulls the words out of a PDF and gives them back as a plain text file you can open in any editor, copy into another document, or feed into any workflow that expects text rather than a file format. Text-based PDFs — those created from Word, Google Docs, or any authoring tool — are parsed instantly without any additional processing. Scanned PDFs go through an OCR step automatically.

When OCR is needed, Tesseract.js runs the recognition directly in your browser using WebAssembly. A small English language model (around 3 MB) is downloaded the first time and then cached. Each page of the scanned PDF is rendered as an image and fed through the OCR engine, which returns a text transcript. You can review and edit the extracted text in the page before downloading the final file.

Plain text loses the visual layout — columns merge, tables flatten, and whitespace-based alignment disappears. If you need the text to stay searchable inside the original PDF rather than extracted separately, use the OCR PDF tool instead, which embeds the recognised text as a hidden layer under the original page image.

How it works

  1. 1

    Upload PDF

    Drop or pick the PDF.

  2. 2

    We extract text

    Text-based PDFs are parsed instantly. Scanned PDFs go through OCR in your browser.

  3. 3

    Edit & download

    Clean up artifacts if you want, then download a .txt file.

Frequently asked questions

Is my file uploaded?
No. Extraction and the OCR fallback both run in your browser tab. Verifiable in DevTools → Network.
How does OCR work in the browser?
Tesseract.js runs a WebAssembly OCR engine in a Web Worker. The first run downloads a ~3 MB English model; subsequent runs are fast.
Will it work on a poorly scanned PDF?
Quality depends on the scan. Clean, straight, high-contrast scans give the best results; faded or skewed scans return lower-quality text.
Max file size?
Bounded by your device memory; we've tested up to 50 MB.
Will it preserve layout?
Plain text loses layout. For a searchable PDF that keeps the original page, use OCR PDF.

Related tools

OCR PDF

Make scans searchable & selectable.

Fill PDF

Type into PDF form fields and download a filled copy.

Sign PDF

Add your signature to any PDF.

Embed this tool

Let your visitors use PDF to Text without leaving your site. Paste the snippet below into any HTML page. Files stay private — everything runs in the visitor's browser.

<iframe
  src="https://pdfwox.com/embed/pdf-to-text"
  width="100%"
  height="600"
  style="border:none;border-radius:8px"
  title="pdf-to-text tool"
  allow="downloads"
  loading="lazy"
></iframe>
<script>
window.addEventListener('message',function(e){
  if(e.data&&e.data.type==='privpdf-resize'){
    var f=document.querySelector('iframe[src="https://pdfwox.com/embed/pdf-to-text"]');
    if(f)f.style.height=e.data.height+'px';
  }
});
</script>

The embed runs entirely in the visitor's browser — no files are uploaded. The iframe resizes automatically to fit its content via postMessage.

Deeper guide

Read the full how-to

Open the guide