document-review-automation-automation-case-study-rellatech
    Back to Case Studies

    Case Study · Operations Automation

    How I Turned a 1,000-File Document Review into a One-Click Checklist

    A behind-the-scenes case study on automating a large-scale document review with Python and Streamlit.

    Document ReviewPythonStreamlitCompliance Ops
    Valentina Akpan, founder of Rellatech

    Valentina Akpan: Founder, Rellatech

    A client engagement came in with a detailed document request list: dozens of specific document types spanning over five years of business activity. Monthly financial reports, quarterly invoices, account statements, position reports, operational records.

    By the time we assessed what was on hand, the folder contained well over a thousand files. The brief was simple: confirm coverage against the checklist and surface every gap, fast.

    The Challenge

    The traditional approach is painful. Open the checklist. Open the folder. Click through files one by one, cross-referencing against each requirement, holding a running mental note of what is covered and what is still missing. With a thousand files spread across years of subfolders, that process can take days, and it is easy to miss things.

    1,000+ files across 5 years of subfolders

    40 distinct checklist requirements to satisfy

    Generic file names hiding the real content

    Manual review would take days with high error risk

    The Solution

    I built a lightweight desktop app using Python and Streamlit. Point it at a folder, and it checks every file against the required document checklist automatically.

    File names alone are unreliable, so the app reads inside the files. For each file, it extracts readable text from PDFs, Word documents, Excel workbooks, and plain text, then searches that content for keywords tied to each checklist requirement. Each requirement is mapped to its own set of search terms, and the app works through every file against every requirement.

    The result comes back as a live dashboard: how many items are covered, which ones still have nothing matched, and exactly which file satisfied each requirement.

    What I Built and Connected

    Folder scanner

    Point it at any folder. Recurses subfolders and indexes every file.

    Multi-format text extraction

    Reads PDFs, Word documents, Excel workbooks, and plain text.

    Keyword mapping per requirement

    Each checklist item is mapped to a set of search terms, run across every file.

    Live coverage dashboard

    Categories, items found vs missing, and which file satisfied each requirement.

    One-click exports

    Missing list, found list, and a ZIP of all matched files ready to hand off.

    Streamlit interface

    Lightweight desktop app the client can run themselves, no terminal required.

    What the Output Looks Like

    Results are grouped by category, mirroring the original checklist. Each category shows how many items were found out of the total. Anything with gaps expands automatically so the missing items surface immediately.

    Every matched file gets a relative path and a one-click download button. A bulk export option produces a missing documents list, a found documents list, and a ZIP of all matched files ready to hand off.

    Before and After

    Before

    • · Two days clicking through subfolders
    • · Mental tracking of 40 checklist items
    • · Easy to miss a file or misclassify one
    • · No clean handoff list for the client

    After

    • One scan in seconds
    • Coverage report grouped by category
    • Every gap surfaced automatically
    • ZIP and lists exported in one click

    The Honest Part: It Is Not Perfect, and That Is Fine

    A tool like this is only as good as its keywords. Some matches come back as false positives because a relevant word appears in a document that is not actually what the checklist required. That is expected.

    The app is not meant to replace human judgment. It replaces the manual grunt work of opening and cross-referencing a thousand files. You still review the flagged matches. But instead of two days in a folder, you spend an hour reviewing a structured report.

    The distinction matters. Your time should go toward thinking, not clicking.

    The Result

    A review that would have taken days collapsed into a single scan and a focused human review.

    • Confirmed strong coverage across years of financial records
    • Surfaced the exact gaps that became the immediate action list
    • Delivered a clean ZIP and missing-items list ready to hand off
    • Replaced two days of manual clicking with one hour of structured review

    Build the tool once. Run it in seconds. Focus on what actually requires your expertise.

    Tools Used

    PythonStreamlitpdfplumberpython-docxopenpyxlCustom keyword schema

    Who This Is For

    This case study will resonate if you are:

    • Compliance, legal, or operations teams sitting on document backlogs of hundreds or thousands of files
    • Businesses preparing for an audit, due-diligence review, or funding round
    • Anyone with a clear checklist of required documents and a messy folder of what they actually have
    • Founders who can describe the gap they need to find but don't want to spend two days clicking through PDFs

    Need help streamlining your document reviews?

    If you are spending days on manual file reviews or compliance checklists, let's talk. I can build a custom tool, or simply organize your documents into a system that works.

    Book a Free Call

    Share This Article

    Comments

    Join the conversation.

    Your email stays private.

    No comments yet. Be the first.

    Leave a comment