Chat with Documents

The Next Step in Information Processing

Instead of relying on public datasets and general knowledge, "Chat with Documents" generates context-specific answers and analyses based on your trusted internal sources. Upload your documents and use them as a basis for answering questions in the chat!

Solving Data Limitations

When asking questions to a language model, you depend on the dataset with which the model is trained. This is generally information retrieved from the internet. Non-public sources are likely not in this dataset. By using your documents as a source for the chat, you ensure that the model has the information you need to answer your questions.

Possibilities with Your Documents

You can ask questions about your documents such as listing the main points of a document or summarizing the document. You can also have the language model perform specific analyses using your own dataset.

Drawbacks of Document-Based Chatting

Uploading and processing documents are extra steps you don't need to take if you can get good answers without the context of specific information. It also takes longer to generate an answer because the necessary information must first be retrieved from the document before the request can be sent to the language model.

Behind the Scenes of Chatting with Documents

The text from the documents you upload is extracted and divided into chunks. These chunks have a fixed number of characters (1024 characters), and we have also set an overlap (128 characters) between the chunks. Each text chunk is stored as a vector in a vector database. For each question, a selection is made from this data based on similarity to the question being asked.

Document Fragment Selection Process

The text pieces are already converted into vectors. Vectors have multiple dimensions that indicate how "similar" this text is to other text. Think of the RGB color system. A color with a similar RGB value is also a similar color but slightly different. The vector database allows us to retrieve text chunks in a ranked and filtered manner based on the question being asked. We select a maximum of 100 text chunks of 1024 characters to send along with the question.

Suitable Models for Document-Based Chatting

We have selected models with a large context window to enable chatting with documents. We want to be able to send up to 100 text chunks of 1024 characters. This is more than 100,000 characters. Models like GPT 3.5 cannot process that much text. Therefore, we recommend using this feature only in combination with GPT-4.1, Gemini 2.5 Pro, and Claude 4.0.

Suitable Models

Suitable models are GPT-4.1, Gemini 2.5 Pro, and Claude 4.0.

Select One or Multiple Documents

You can turn on file mode by clicking the paperclip on the right side of the question bar. You can choose up to 10 files to chat with.

Suitable Language Models

When you start chatting with documents, it is checked whether the language model is suitable for chatting with documents. If this is not the case, GPT-4o is automatically selected.

Chat with Documents

You chat with these documents as long as the file mode is on.

Process Per File

In addition to chatting with documents, AI-School also offers the ability to apply a prompt separately to each document and receive individual responses. This feature is called Process per file.

Process per file

This feature can be used in combination with "Chat with files".

Possible Scenario

A practical example of using "Process per file":

You upload the test and the answer model and enable them in Chat with files
You upload multiple submitted tests and enable them in Process per file
You formulate a prompt that is applied to all files individually

This way, you can, for example, have all submitted tests automatically graded based on the answer model.

Maximum Number of Files

There is a maximum of 30 files for the "Process per file" feature.

Supported File Types

AI-School supports various file types for chatting with documents:

PDF files ending in .pdf
Word files ending in .docx
CSV files ending in .csv
JSON files ending in .json
Text files ending in .txt
Audio and video files with the extensions 'mp3', 'mp4', 'mpeg', 'mpga', 'm4a', 'wav', or 'webm'

Chatting with Audio or Video Files

For chatting with audio or video files, AI-School uses OpenAI's Whisper model.

After text extraction, we run the text through GPT-4o to check and correct punctuation and spelling.

Then follows the same procedure as extraction from PDF or Word documents.

Whisper has a limit of 25 MB per audio or video file. We therefore apply the same limit when uploading new files.

Example Files You Can Download

Large History Document

Solving Data Limitations​

Possibilities with Your Documents​

Drawbacks of Document-Based Chatting​

Behind the Scenes of Chatting with Documents​

Document Fragment Selection Process​

Suitable Models for Document-Based Chatting​

Select One or Multiple Documents​

Process Per File​

Possible Scenario​

Supported File Types​

Chatting with Audio or Video Files​

Example Files You Can Download​