Data readiness sounds like a technical task, but the first pass is business work. Someone needs to decide which source is correct, which answers are approved, which customer details are private, and which output must go to a person.
The source can be simple: website pages, PDFs, rate sheets, product lists, forms, SOPs, call notes, chat logs, or branch reports. The problem is usually not the file type. The problem is that teams have three versions of the same answer and no one knows which one is current.
Choose one pilot and one owner
Do not prepare data for every possible AI idea. Pick one pilot first. A support chatbot needs different source material than a reporting assistant. A document sorter needs samples and labels. An internal search assistant needs approved staff documents.
Then name one owner. This person does not have to be technical. They need authority to say which source is correct, approve changes, and decide when an answer is no longer valid.
Make an inventory of source material
Create a simple list of the files and systems the pilot will use. Include the owner, last update date, format, language, and whether the file has private data.
- Website pages and product pages.
- FAQ files, help desk scripts, and chat macros.
- PDF policies, rate sheets, admission rules, and branch notices.
- Forms, invoices, KYC samples, and approval checklists.
- Spreadsheets used for reporting or stock checks.
- Real customer messages in English, Nepali, and mixed text.
Ask three staff members where they would find the answer to the same customer question. If they point to different files, clean the source before you build.
Remove weak source material
AI systems repeat the source they are given. If the source is wrong, the output will be wrong. Before the pilot, remove old files, duplicate answers, draft policies, expired prices, and documents that no one has approved.
Keep a short change log. If you update a rule, note who approved it and when. That record becomes useful when staff ask why the bot or assistant answered in a certain way.
Protect private data early
Private customer details should not be thrown into a pilot without rules. Before you share samples, decide what to mask or remove: phone numbers, account numbers, health details, citizenship data, bank details, addresses, and internal staff notes.
For some pilots, you can test with fake but realistic samples. For others, real examples are needed. In that case, limit access, remove fields that are not needed, and keep logs of who reviewed the material.
Label examples for document and message work
If the pilot sorts documents, leads, complaints, or support messages, the system needs examples with labels. Keep the labels small and useful. For a support inbox, labels might be delivery, return, payment, complaint, booking, or lead. For a document queue, labels might be approved, missing field, duplicate, expired, or manual review.
Use examples from the way people write in Nepal. Include Devanagari, Romanized Nepali, English, mixed messages, spelling drift, and local place names. A polished English sample set will not prepare the system for real customer text.
Write the rules the system must follow
The rules do not need to be long. They need to be clear.
- Which topics can the system answer?
- Which topics must go to staff?
- What should happen when the answer is missing?
- Which data can be stored?
- Which answers need approval before customers see them?
- Who updates the source after prices, dates, or policies change?
Test messy examples too
A demo often uses clean questions. Real users do not. Test half-written messages, angry messages, vague messages, spelling errors, mixed scripts, and questions outside the allowed topics.
The pilot should fail in a controlled way. A good failure says it cannot answer, asks for a missing detail, or sends the case to staff. A bad failure invents an answer or hides the handoff.
- One pilot is chosen.
- One business owner is named.
- Source files are current and approved.
- Private fields are removed or masked.
- Real examples are labeled.
- Nepali, English, and mixed text are included.
- Human handoff rules are written.
- Weekly review time is scheduled.
Decide what success looks like
Data readiness is not complete because a folder is tidy. It is complete when the pilot can be tested against a clear result. That result might be fewer repeated support replies, faster lead handoff, shorter document review time, or less time searching internal files.
Pick one metric. Read the logs beside that metric. The logs will show which source files need repair before the next version.