hasanbasbunar commited on
Commit
8c1d8a0
·
1 Parent(s): 8e5bf9c

Initial commit

Browse files
Files changed (5) hide show
  1. .gitignore +92 -0
  2. README.md +58 -0
  3. app.py +528 -0
  4. c29ca011-87ff-45b0-8236-08d629812732.svg +155 -0
  5. requirements.txt +55 -0
.gitignore ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Byte-compiled / optimized / DLL files
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+
6
+ # C extensions
7
+ *.so
8
+
9
+ # Distribution / packaging
10
+ .Python
11
+ env/
12
+ venv/
13
+ build/
14
+ develop-eggs/
15
+ dist/
16
+ downloads/
17
+ eggs/
18
+ .eggs/
19
+ lib/
20
+ lib64/
21
+ parts/
22
+ sdist/
23
+ var/
24
+ *.egg-info/
25
+ .installed.cfg
26
+ *.egg
27
+
28
+ # Installer logs
29
+ pip-log.txt
30
+ pip-delete-this-directory.txt
31
+
32
+ # Unit test / coverage reports
33
+ htmlcov/
34
+ .tox/
35
+ .nox/
36
+ .coverage
37
+ .coverage.*
38
+ .cache
39
+ nosetests.xml
40
+ coverage.xml
41
+ *.cover
42
+ .hypothesis/
43
+ .pytest_cache/
44
+
45
+ # Jupyter Notebook
46
+ .ipynb_checkpoints
47
+
48
+ # pyenv
49
+ .python-version
50
+
51
+ # mypy
52
+ .mypy_cache/
53
+ .dmypy.json
54
+
55
+ # Pyre type checker
56
+ .pyre/
57
+
58
+ # VS Code
59
+ .vscode/
60
+
61
+ # MacOS
62
+ .DS_Store
63
+
64
+ # Gradio temp files
65
+ *.gradio*
66
+
67
+ # Logs
68
+ *.log
69
+
70
+ # System files
71
+ Thumbs.db
72
+
73
+ # Secrets
74
+ *.env
75
+ .env.*
76
+
77
+ # Audio/Video/Temp files
78
+ *.wav
79
+ *.mp3
80
+ *.mp4
81
+ *.srt
82
+ *.tmp
83
+ *.temp
84
+
85
+ # Ignore test outputs
86
+ test_output/
87
+
88
+ # Ignore user uploads
89
+ uploads/
90
+
91
+ # Ignore SVGs if generated
92
+ generated_svg/
README.md CHANGED
@@ -1,3 +1,4 @@
 
1
  ---
2
  title: Voxtral
3
  emoji: ⚡
@@ -12,3 +13,60 @@ short_description: Chat and transcribe audio files with AI, powered by Voxtral.
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <<<<<<< HEAD
2
  ---
3
  title: Voxtral
4
  emoji: ⚡
 
13
  ---
14
 
15
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
16
+ =======
17
+ # Voxtral
18
+
19
+ **Multimodal chatbot and audio transcription web app powered by Gradio and Mistral API.**
20
+
21
+ ## Features
22
+ - Chatbot with text and audio input
23
+ - Audio file transcription (with SRT export)
24
+ - Modern Gradio web interface
25
+ - API key management (secure, local to browser)
26
+
27
+ ## Demo
28
+ ![Screenshot](c29ca011-87ff-45b0-8236-08d629812732.svg)
29
+
30
+ ## Installation
31
+
32
+ 1. **Clone the repository**
33
+ ```bash
34
+ git clone <repo-url>
35
+ cd voxtral-gradio
36
+ ```
37
+ 2. **Create and activate a virtual environment**
38
+ ```bash
39
+ python3 -m venv .venv
40
+ source .venv/bin/activate
41
+ ```
42
+ 3. **Install dependencies**
43
+ ```bash
44
+ pip install -r requirements.txt
45
+ ```
46
+
47
+ ## Usage
48
+
49
+ 1. **Run the app**
50
+ ```bash
51
+ python app.py
52
+ ```
53
+ 2. Open your browser and go to [http://localhost:7860](http://localhost:7860)
54
+ 3. Enter your Mistral API key in the interface to start chatting or transcribing audio files.
55
+
56
+ ## Configuration
57
+ - **API Key:** Your Mistral API key is required for chat and transcription features. It is stored only in your browser session and never sent to any third-party server.
58
+ - **Environment variables:** Not required by default. For cloud deployment, you may need to set the `PORT` environment variable.
59
+
60
+ ## Deployment
61
+ - For production, set `debug=False` in `app.py`.
62
+ - Compatible with most Python hosting platforms (Heroku, Railway, etc.).
63
+ - To specify a custom port:
64
+ ```python
65
+ demo.launch(server_port=int(os.environ.get("PORT", 7860)), debug=False)
66
+ ```
67
+
68
+ ## License
69
+ MIT
70
+
71
+ ---
72
+ >>>>>>> 579ee35 (Initial commit)
app.py ADDED
@@ -0,0 +1,528 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import httpx
3
+ import os
4
+ import json
5
+ import inspect
6
+ import aiofiles
7
+ import asyncio
8
+ import tempfile
9
+ import math
10
+ from gradio.themes import colors, sizes, Font, GoogleFont, Origin
11
+
12
+ limits = httpx.Limits(max_connections=100, max_keepalive_connections=20)
13
+ HTTP_CLIENT = httpx.AsyncClient(
14
+ http2=False,
15
+ limits=limits,
16
+ timeout=httpx.Timeout(30.0, pool=10.0)
17
+ )
18
+
19
+ # --- SRT Generation Functions ---
20
+ def format_srt_time(total_seconds):
21
+ hours = math.floor(total_seconds / 3600)
22
+ minutes = math.floor((total_seconds % 3600) / 60)
23
+ seconds = math.floor(total_seconds % 60)
24
+ milliseconds = round((total_seconds - math.floor(total_seconds)) * 1000)
25
+ return f"{hours:02d}:{minutes:02d}:{seconds:02d},{milliseconds:03d}"
26
+
27
+ def generate_srt_file(json_data):
28
+ if not json_data or "segments" not in json_data or not json_data["segments"]:
29
+ print("No segments to convert to SRT.")
30
+ return None
31
+
32
+ srt_content = ""
33
+ for index, segment in enumerate(json_data["segments"]):
34
+ sequence = index + 1
35
+ start_time = format_srt_time(segment['start'])
36
+ end_time = format_srt_time(segment['end'])
37
+ text = segment['text'].strip()
38
+ srt_content += f"{sequence}\n{start_time} --> {end_time}\n{text}\n\n"
39
+
40
+ try:
41
+ with tempfile.NamedTemporaryFile(mode='w+', delete=False, suffix='.srt', encoding='utf-8') as temp_file:
42
+ temp_file.write(srt_content)
43
+ print(f"Temporary SRT file created at: {temp_file.name}")
44
+ return temp_file.name
45
+ except Exception as e:
46
+ print(f"Error creating SRT file: {e}")
47
+ return None
48
+
49
+ # --- Tools for Chatbot ---
50
+ def get_city_info(city: str):
51
+ if "paris" in city.lower():
52
+ return json.dumps({"population": "2.1 million", "monument": "Eiffel Tower", "fact": "Paris is known as the 'City of Light'."})
53
+ elif "tokyo" in city.lower():
54
+ return json.dumps({"population": "14 million", "monument": "Tokyo Tower", "fact": "Tokyo is the largest metropolitan area in the world."})
55
+ else:
56
+ return json.dumps({"error": f"Sorry, I don't have information about {city}."})
57
+
58
+ available_tools = {"get_city_info": get_city_info}
59
+ tools_schema = [
60
+ {"type": "function", "function": {"name": "get_city_info", "description": "Get information about a specific city.", "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "The name of the city, e.g., 'Paris'."}}, "required": ["city"]}}}
61
+ ]
62
+
63
+ async def upload_file_to_public_service(filepath: str):
64
+ if not filepath:
65
+ return None
66
+ url = "https://uguu.se/upload"
67
+ try:
68
+ async with aiofiles.open(filepath, 'rb') as f:
69
+ content = await f.read()
70
+ files = {'files[]': (os.path.basename(filepath), content)}
71
+ response = await HTTP_CLIENT.post(url, files=files, timeout=30.0)
72
+ response.raise_for_status()
73
+ result = response.json()
74
+ if "files" in result and result["files"] and "url" in result["files"][0]:
75
+ full_url = result["files"][0]["url"]
76
+ print(f"File successfully uploaded: {full_url}")
77
+ return full_url
78
+ else:
79
+ print(f"Upload API response error: {result}")
80
+ return None
81
+ except httpx.HTTPStatusError as e:
82
+ print(f"HTTP error during upload: {e.response.status_code} - {e.response.text}")
83
+ return None
84
+ except httpx.RequestError as e:
85
+ print(f"Connection error during upload: {e}")
86
+ return None
87
+ except (IOError, FileNotFoundError) as e:
88
+ print(f"File read error: {e}")
89
+ return None
90
+ except (KeyError, IndexError) as e:
91
+ print(f"Unexpected JSON response structure: {e}")
92
+ return None
93
+
94
+ async def handle_api_call(api_key, messages, model_chat):
95
+ headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
96
+ api_url = "https://api.mistral.ai/v1/chat/completions"
97
+ json_data = {
98
+ "model": model_chat,
99
+ "messages": messages,
100
+ "tools": tools_schema,
101
+ "tool_choice": "auto"
102
+ }
103
+ return await HTTP_CLIENT.post(api_url, headers=headers, json=json_data, timeout=60.0)
104
+
105
+ async def handle_chat_submission(api_key, user_message, chat_history_api, audio_url, model_chat):
106
+ if not api_key:
107
+ user_content = [{"type": "text", "text": user_message}] if user_message else []
108
+ updated_history = chat_history_api + [{"role": "user", "content": user_content}, {"role": "assistant", "content": "Error: API key not configured."}]
109
+ return updated_history, ""
110
+
111
+ current_user_content = []
112
+ if audio_url:
113
+ current_user_content.append({"type": "input_audio", "input_audio": {"data": audio_url, "format": "mp3"}})
114
+ if user_message:
115
+ current_user_content.append({"type": "text", "text": user_message})
116
+ if not current_user_content:
117
+ return chat_history_api, ""
118
+ chat_history_api.append({"role": "user", "content": current_user_content})
119
+ try:
120
+ response = await handle_api_call(api_key, chat_history_api, model_chat)
121
+ if response.status_code != 200:
122
+ error_msg = response.json().get("message", response.text)
123
+ chat_history_api.append({"role": "assistant", "content": f"Error API: {error_msg}"})
124
+ return chat_history_api, ""
125
+ assistant_message = response.json()['choices'][0]['message']
126
+ except httpx.HTTPStatusError as e:
127
+ error_msg = e.response.json().get("message", e.response.text)
128
+ chat_history_api.append({"role": "assistant", "content": f"Error API: {error_msg}"})
129
+ return chat_history_api, ""
130
+ except httpx.RequestError as e:
131
+ chat_history_api.append({"role": "assistant", "content": f"Connection error: {e}"})
132
+ return chat_history_api, ""
133
+ if assistant_message.get("tool_calls"):
134
+ chat_history_api.append(assistant_message)
135
+ tool_call = assistant_message["tool_calls"][0]
136
+ function_name = tool_call['function']['name']
137
+ function_args = json.loads(tool_call['function']['arguments'])
138
+ if function_name in available_tools:
139
+ tool_call_id = tool_call['id']
140
+ function_to_call = available_tools[function_name]
141
+ if inspect.iscoroutinefunction(function_to_call):
142
+ tool_output = await function_to_call(**function_args)
143
+ else:
144
+ tool_output = function_to_call(**function_args)
145
+ chat_history_api.append({
146
+ "role": "tool",
147
+ "tool_call_id": tool_call_id,
148
+ "content": tool_output
149
+ })
150
+ try:
151
+ second_response = await handle_api_call(api_key, chat_history_api, model_chat)
152
+ if second_response.status_code != 200:
153
+ error_msg = second_response.json().get("message", second_response.text)
154
+ chat_history_api.append({"role": "assistant", "content": f"Error API after tool call: {error_msg}"})
155
+ else:
156
+ chat_history_api.append(second_response.json()['choices'][0]['message'])
157
+ except httpx.HTTPStatusError as e:
158
+ error_msg = e.response.json().get("message", e.response.text)
159
+ chat_history_api.append({"role": "assistant", "content": f"Error API after tool call: {error_msg}"})
160
+ except httpx.RequestError as e:
161
+ chat_history_api.append({"role": "assistant", "content": f"Connection error after tool call: {e}"})
162
+ else:
163
+ chat_history_api.append({"role": "assistant", "content": f"Error: Unknown tool '{function_name}'."})
164
+ else:
165
+ chat_history_api.append(assistant_message)
166
+ return chat_history_api, ""
167
+
168
+ def format_history_for_display(api_history):
169
+ display_messages = []
170
+ for msg in api_history:
171
+ if msg['role'] == 'user':
172
+ text_content = ""
173
+ has_audio = False
174
+ if isinstance(msg.get('content'), list):
175
+ text_part = next((part['text'] for part in msg['content'] if part['type'] == 'text'), None)
176
+ if text_part:
177
+ text_content = text_part
178
+ if any(part['type'] == 'input_audio' for part in msg['content']):
179
+ has_audio = True
180
+ elif isinstance(msg.get('content'), str):
181
+ text_content = msg['content']
182
+ display_content = f"🎤 {text_content}" if has_audio else text_content
183
+ if display_content:
184
+ display_messages.append({"role": "user", "content": display_content})
185
+ elif msg['role'] == 'assistant':
186
+ if msg.get("tool_calls"):
187
+ tool_call = msg["tool_calls"][0]
188
+ func_name = tool_call['function']['name']
189
+ func_args = tool_call['function']['arguments']
190
+ tool_display_content = f"⚙️ *Calling tool `{func_name}` with arguments : `{func_args}`...*"
191
+ display_messages.append({"role": "assistant", "content": tool_display_content})
192
+ elif msg.get('content'):
193
+ display_messages.append({"role": "assistant", "content": msg['content']})
194
+ return display_messages
195
+
196
+ async def transcribe_audio(api_key, source_type, audio_file_path, audio_url, add_timestamps, model_transcription):
197
+ if not api_key:
198
+ return {"error": "Please first enter your API key."}
199
+ headers = {"Authorization": f"Bearer {api_key}"}
200
+ api_url = "https://api.mistral.ai/v1/audio/transcriptions"
201
+ try:
202
+ payload = {'model': (None, model_transcription)}
203
+ if add_timestamps:
204
+ payload['timestamp_granularities'] = (None, 'segment')
205
+ if source_type == "Upload a file":
206
+ if not audio_file_path:
207
+ return {"error": "Please upload an audio file."}
208
+ async with aiofiles.open(audio_file_path, "rb") as f:
209
+ content = await f.read()
210
+ payload['file'] = (os.path.basename(audio_file_path), content, "audio/mpeg")
211
+ response = await HTTP_CLIENT.post(api_url, headers=headers, files=payload, timeout=120.0)
212
+ elif source_type == "Use a URL":
213
+ if not audio_url:
214
+ return {"error": "Please provide an audio URL."}
215
+ payload['file_url'] = (None, audio_url)
216
+ response = await HTTP_CLIENT.post(api_url, headers=headers, files=payload, timeout=120.0)
217
+ else:
218
+ return {"error": "Invalid source type."}
219
+ response.raise_for_status()
220
+ return response.json()
221
+ except httpx.HTTPStatusError as e:
222
+ try:
223
+ await e.response.aread()
224
+ details = e.response.json().get("message", e.response.text)
225
+ except Exception:
226
+ details = e.response.text
227
+ return {"error": f"API Error {e.response.status_code}", "details": details}
228
+ except httpx.RequestError as e:
229
+ return {"error": "Connection error", "details": str(e)}
230
+ except IOError as e:
231
+ return {"error": "File reading error", "details": str(e)}
232
+
233
+ async def run_transcription_and_update_ui(api_key, source, file_path, url, timestamps, model_transcription):
234
+ yield {
235
+ transcription_button: gr.update(value="⏳ Transcription in progress...", interactive=False),
236
+ transcription_status: gr.update(value="*Starting transcription...*", visible=True),
237
+ transcription_output: gr.update(visible=False),
238
+ download_zone: gr.update(visible=False),
239
+ download_file_output: gr.update(value=None)
240
+ }
241
+ json_result = await transcribe_audio(api_key, source, file_path, url, timestamps, model_transcription)
242
+ has_segments = isinstance(json_result, dict) and "segments" in json_result and json_result.get("segments")
243
+ is_error = isinstance(json_result, dict) and "error" in json_result
244
+ if is_error:
245
+ error_title = json_result.get("error", "Unknown error")
246
+ error_details = json_result.get("details", "No details available.")
247
+ yield {
248
+ transcription_status: gr.update(value=f"### ❌ {error_title}\n\n*_{error_details}_*", visible=True),
249
+ transcription_button: gr.update(value="▶️ Start transcription", interactive=True)
250
+ }
251
+ elif has_segments:
252
+ yield {
253
+ transcription_status: gr.update(value="### ✔️ Transcription complete!", visible=True),
254
+ transcription_output: gr.update(value=json_result, visible=True),
255
+ download_zone: gr.update(visible=True),
256
+ transcription_button: gr.update(value="▶️ Start transcription", interactive=True)
257
+ }
258
+ else:
259
+ text_result = json_result.get('text', "No text detected.")
260
+ yield {
261
+ transcription_status: gr.update(value=f"### ⚠️ Partial result\n\n_{text_result}_", visible=True),
262
+ transcription_output: gr.update(value=json_result, visible=True),
263
+ download_zone: gr.update(visible=False),
264
+ transcription_button: gr.update(value="▶️ Start transcription", interactive=True)
265
+ }
266
+
267
+ theme = gr.themes.Origin(
268
+ primary_hue="orange",
269
+ secondary_hue="gray",
270
+ neutral_hue="zinc",
271
+ text_size="md",
272
+ spacing_size="md",
273
+ radius_size="xxl",
274
+ font=("Inter", "IBM Plex Sans", "ui-sans-serif", "system-ui", "sans-serif"),
275
+ ).set(
276
+ body_background_fill="#f7f8fa",
277
+ block_background_fill="#fff",
278
+ block_shadow="0 4px 24px 0 #0001, 0 1.5px 4px 0 #0001",
279
+ block_border_width="1px",
280
+ block_border_color="#ececec",
281
+ button_primary_background_fill="#223a5e",
282
+ button_primary_background_fill_hover="#1a2c47",
283
+ button_primary_text_color="#fff",
284
+ input_border_color="#e5e7eb",
285
+ input_border_color_focus="#223a5e",
286
+ input_background_fill="#fafbfc",
287
+ input_shadow="0 0 0 2px #223a5e22",
288
+ )
289
+
290
+ custom_css = """
291
+ .gradio-container label,
292
+ .gradio-container .gr-button,
293
+ .gradio-container .gr-button span,
294
+ .gradio-container a {
295
+ color: #FF6F3C !important;
296
+ }
297
+ .gradio-container .gr-button {
298
+ border-color: #FF6F3C !important;
299
+ background: #FF6F3C !important;
300
+ color: #fff !important;
301
+ }
302
+ .gradio-container .gr-button:not([disabled]):hover {
303
+ background: #fff !important;
304
+ color: #FF6F3C !important;
305
+ border: 2px solid #FF6F3C !important;
306
+ }
307
+ .gradio-container .gr-box, .gradio-container .gr-block {
308
+ background: #f7f8fa !important;
309
+ }
310
+ .gradio-container .gr-input, .gradio-container .gr-textbox, .gradio-container .gr-text-input, .gradio-container .gr-file, .gradio-container .gr-audio {
311
+ background: #f7f8fa !important;
312
+ color: #223a5e !important;
313
+ border: 1.2px solid #FF6F3C !important;
314
+ }
315
+ .gradio-container .gr-input::placeholder, .gradio-container .gr-textbox::placeholder, .gradio-container .gr-text-input::placeholder {
316
+ color: #888 !important;
317
+ }
318
+ .gradio-container .gr-markdown, .gradio-container .gr-markdown p {
319
+ color: #888 !important;
320
+ }
321
+ """
322
+
323
+ with gr.Blocks(theme=theme, title="Voxtral Pro", css=custom_css) as demo:
324
+ gr.Markdown("""
325
+ <div style='text-align:center; margin-bottom:1.5em;'>
326
+ <h1 style='margin-bottom:0.2em; color:#FF6F3C;'>Voxtral Pro</h1>
327
+ <div style='font-size:1.1em; color:#555;'>The all-in-one AI assistant for audio, text & productivity.<br>
328
+ <b style='color:#FF6F3C;'>Fast and powerful.</b></div>
329
+ <div style='width:60px; height:4px; background:#FF6F3C; margin:18px auto 0 auto; border-radius:2px;'></div>
330
+ </div>
331
+ """)
332
+
333
+ api_history_state = gr.State([])
334
+ api_key_state = gr.State()
335
+ # Dropdowns for model selection
336
+ model_choices = ["voxtral-mini-2507", "voxtral-small-2507"]
337
+ chat_model_state = gr.State("voxtral-mini-2507")
338
+ transcription_model_state = gr.State("voxtral-mini-2507")
339
+ with gr.Accordion("🔑 API Key Configuration", open=True):
340
+ with gr.Row():
341
+ api_key_input = gr.Textbox(label="Mistral API Key", placeholder="Enter your API key here...", type="password", scale=6)
342
+ chat_model_dropdown = gr.Dropdown(choices=model_choices, value="voxtral-mini-2507", label="Chat Model", scale=2)
343
+ transcription_model_dropdown = gr.Dropdown(choices=model_choices, value="voxtral-mini-2507", label="Transcription Model", scale=2)
344
+ save_api_key_button = gr.Button("Save Key", scale=1)
345
+ api_key_status = gr.Markdown(value="*Please save your API key to use the application.*")
346
+ gr.Markdown(
347
+ "<span style='font-size: 0.95em; color: #888;'>🔒 <b>Security:</b> Your API key is stored only in your browser session memory and is never sent to any server except Mistral's API. It is not saved or shared anywhere else.</span>",
348
+ elem_id="api-key-security-info"
349
+ )
350
+
351
+ with gr.Tabs():
352
+ with gr.TabItem("💬 Multimodal Chatbot"):
353
+ gr.Markdown("### Chat with text and audio files at any time.")
354
+ chatbot_display = gr.Chatbot(
355
+ label="Conversation",
356
+ height=500,
357
+ avatar_images=(None, "/Users/hasanbasbunar/voxtral-gradio/c29ca011-87ff-45b0-8236-08d629812732.svg"),
358
+ type="messages"
359
+ )
360
+ with gr.Row():
361
+ audio_input_files = gr.File(
362
+ label="Drag and drop your audio files here",
363
+ file_count="multiple",
364
+ file_types=["audio"],
365
+ elem_id="upload-box",
366
+ scale=2,
367
+ height=100
368
+ )
369
+ user_textbox = gr.Textbox(
370
+ label="Your message",
371
+ placeholder="Type your message here...",
372
+ lines=2,
373
+ scale=6,
374
+ elem_id="user-message-box",
375
+ )
376
+ mic_input = gr.Audio(
377
+ label="Voice recording",
378
+ sources=["microphone"],
379
+ type="filepath",
380
+ elem_classes="voice-recorder",
381
+ scale=2,
382
+ )
383
+ send_button = gr.Button("Send", variant="primary")
384
+ clear_button = gr.Button("🗑️ Clear conversation", variant="secondary")
385
+ with gr.TabItem("🎙️ Audio Transcription"):
386
+ gr.Markdown("### Transcribe an audio file and export the result.")
387
+ with gr.Row(variant="panel"):
388
+ with gr.Column(scale=1):
389
+ gr.Markdown("#### 1. Audio Source")
390
+ source_type_transcription = gr.Radio(["Upload a file", "Use a URL"], label="Source type", value="Upload a file")
391
+ audio_file_input = gr.Audio(type="filepath", label="Audio file", visible=True)
392
+ audio_url_input = gr.Textbox(label="Audio file URL", placeholder="https://.../audio.mp3", visible=False)
393
+ gr.Markdown("#### 2. Options")
394
+ timestamp_checkbox = gr.Checkbox(label="Include timestamps (for .SRT)", value=True)
395
+ transcription_button = gr.Button("▶️ Start transcription", variant="primary")
396
+ with gr.Column(scale=2):
397
+ gr.Markdown("#### 3. Results")
398
+ transcription_status = gr.Markdown(visible=False)
399
+ transcription_output = gr.JSON(label="Raw transcription data", visible=False)
400
+ with gr.Group(visible=False) as download_zone:
401
+ download_srt_button = gr.Button("💾 Download .srt file", variant="secondary")
402
+ download_file_output = gr.File(label="Your file is ready:", interactive=False)
403
+
404
+ def save_key(api_key):
405
+ return api_key, "✅ API key saved."
406
+ save_api_key_button.click(fn=save_key, inputs=[api_key_input], outputs=[api_key_state, api_key_status])
407
+ # Dropdown logic: update State when dropdown changes
408
+ def update_chat_model(model):
409
+ return model
410
+ def update_transcription_model(model):
411
+ return model
412
+ chat_model_dropdown.change(fn=update_chat_model, inputs=[chat_model_dropdown], outputs=[chat_model_state])
413
+ transcription_model_dropdown.change(fn=update_transcription_model, inputs=[transcription_model_dropdown], outputs=[transcription_model_state])
414
+ async def on_submit(api_key, user_msg, api_history, uploaded_files, mic_file, chat_model):
415
+ # 1. Check for API Key
416
+ if not api_key:
417
+ api_history.append({"role": "user", "content": user_msg or "..."})
418
+ api_history.append({"role": "assistant", "content": "Error: Please configure your API key."})
419
+ yield api_history, format_history_for_display(api_history), "", None, None
420
+ return
421
+
422
+ # 2. Collect all audio file paths
423
+ all_filepaths = []
424
+ if uploaded_files:
425
+ all_filepaths.extend(p.name for p in uploaded_files)
426
+ if mic_file:
427
+ all_filepaths.append(mic_file)
428
+
429
+ # 3. Upload files in parallel and show loading state
430
+ audio_urls_to_send = []
431
+ if all_filepaths:
432
+ audio_count = len(all_filepaths)
433
+ api_history.append({"role": "user", "content": user_msg or ""}) # Placeholder for display
434
+ api_history.append({"role": "assistant", "content": f"⏳ *Uploading {audio_count} audio file{'s' if audio_count > 1 else ''}...*"})
435
+ yield api_history, format_history_for_display(api_history), user_msg, None, None
436
+
437
+ upload_tasks = [upload_file_to_public_service(path) for path in all_filepaths]
438
+ uploaded_urls = await asyncio.gather(*upload_tasks)
439
+ audio_urls_to_send = [url for url in uploaded_urls if url]
440
+ api_history.pop() # Remove loading message
441
+
442
+ if len(audio_urls_to_send) != audio_count:
443
+ api_history.append({"role": "assistant", "content": f"Error: Failed to upload {audio_count - len(audio_urls_to_send)} file(s)."})
444
+ yield api_history, format_history_for_display(api_history), user_msg, None, None
445
+ return
446
+
447
+ # 4. Construct the user message for the API
448
+ current_user_content = []
449
+ for url in audio_urls_to_send:
450
+ current_user_content.append({"type": "input_audio", "input_audio": {"data": url}})
451
+ if user_msg:
452
+ current_user_content.append({"type": "text", "text": user_msg})
453
+
454
+ if not current_user_content:
455
+ yield api_history, format_history_for_display(api_history), "", None, None
456
+ return
457
+
458
+ # If we had a placeholder, replace it. Otherwise, append.
459
+ if all_filepaths:
460
+ api_history[-1] = {"role": "user", "content": current_user_content}
461
+ else:
462
+ api_history.append({"role": "user", "content": current_user_content})
463
+
464
+ # 5. Call API and handle tool calls
465
+ try:
466
+ response = await handle_api_call(api_key, api_history, chat_model)
467
+ response.raise_for_status()
468
+ assistant_message = response.json()['choices'][0]['message']
469
+ api_history.append(assistant_message)
470
+
471
+ if "tool_calls" in assistant_message and assistant_message["tool_calls"]:
472
+ tool_call = assistant_message["tool_calls"][0]
473
+ function_name = tool_call['function']['name']
474
+ if function_name in available_tools:
475
+ function_args = json.loads(tool_call['function']['arguments'])
476
+ tool_output = available_tools[function_name](**function_args)
477
+ api_history.append({"role": "tool", "tool_call_id": tool_call['id'], "content": tool_output})
478
+
479
+ second_response = await handle_api_call(api_key, api_history, chat_model)
480
+ second_response.raise_for_status()
481
+ final_message = second_response.json()['choices'][0]['message']
482
+ api_history.append(final_message)
483
+ except Exception as e:
484
+ error_details = str(e)
485
+ if hasattr(e, 'response') and e.response:
486
+ error_details = e.response.text
487
+ api_history.append({"role": "assistant", "content": f"API Error: {error_details}"})
488
+
489
+ # 6. Final UI update
490
+ yield api_history, format_history_for_display(api_history), "", None, None
491
+
492
+ chat_inputs = [api_key_state, user_textbox, api_history_state, audio_input_files, mic_input, chat_model_state]
493
+ chat_outputs = [api_history_state, chatbot_display, user_textbox, audio_input_files, mic_input]
494
+ send_button.click(
495
+ fn=on_submit,
496
+ inputs=chat_inputs,
497
+ outputs=chat_outputs
498
+ )
499
+ user_textbox.submit(
500
+ fn=on_submit,
501
+ inputs=chat_inputs,
502
+ outputs=chat_outputs
503
+ )
504
+ def clear_chat(): return [], [], "", None, None
505
+ clear_button.click(fn=clear_chat, outputs=chat_outputs)
506
+ all_transcription_outputs = [
507
+ transcription_button,
508
+ transcription_status,
509
+ transcription_output,
510
+ download_zone,
511
+ download_file_output
512
+ ]
513
+ transcription_button.click(
514
+ fn=run_transcription_and_update_ui,
515
+ inputs=[api_key_state, source_type_transcription, audio_file_input, audio_url_input, timestamp_checkbox, transcription_model_state],
516
+ outputs=all_transcription_outputs
517
+ )
518
+ download_srt_button.click(
519
+ fn=generate_srt_file,
520
+ inputs=[transcription_output],
521
+ outputs=[download_file_output]
522
+ )
523
+ def toggle_transcription_inputs(source_type): return gr.update(visible=source_type == "Upload a file"), gr.update(visible=source_type == "Use a URL")
524
+ source_type_transcription.change(fn=toggle_transcription_inputs, inputs=source_type_transcription, outputs=[audio_file_input, audio_url_input])
525
+
526
+ if __name__ == "__main__":
527
+ demo.queue(default_concurrency_limit=20, max_size=40)
528
+ demo.launch(debug=True, max_threads=20)
c29ca011-87ff-45b0-8236-08d629812732.svg ADDED
requirements.txt ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ aiofiles==24.1.0
2
+ annotated-types==0.7.0
3
+ anyio==4.9.0
4
+ Brotli==1.1.0
5
+ certifi==2025.7.14
6
+ charset-normalizer==3.4.2
7
+ click==8.2.1
8
+ fastapi==0.116.1
9
+ ffmpy==0.6.0
10
+ filelock==3.18.0
11
+ fsspec==2025.7.0
12
+ gradio==5.37.0
13
+ gradio_client==1.10.4
14
+ groovy==0.1.2
15
+ h11==0.16.0
16
+ hf-xet==1.1.5
17
+ httpcore==1.0.9
18
+ httpx==0.28.1
19
+ huggingface-hub==0.33.4
20
+ idna==3.10
21
+ Jinja2==3.1.6
22
+ markdown-it-py==3.0.0
23
+ MarkupSafe==3.0.2
24
+ mdurl==0.1.2
25
+ numpy==2.3.1
26
+ orjson==3.11.0
27
+ packaging==25.0
28
+ pandas==2.3.1
29
+ pillow==11.3.0
30
+ pydantic==2.11.7
31
+ pydantic_core==2.33.2
32
+ pydub==0.25.1
33
+ Pygments==2.19.2
34
+ python-dateutil==2.9.0.post0
35
+ python-multipart==0.0.20
36
+ pytz==2025.2
37
+ PyYAML==6.0.2
38
+ requests==2.32.4
39
+ rich==14.0.0
40
+ ruff==0.12.3
41
+ safehttpx==0.1.6
42
+ semantic-version==2.10.0
43
+ shellingham==1.5.4
44
+ six==1.17.0
45
+ sniffio==1.3.1
46
+ starlette==0.47.1
47
+ tomlkit==0.13.3
48
+ tqdm==4.67.1
49
+ typer==0.16.0
50
+ typing-inspection==0.4.1
51
+ typing_extensions==4.14.1
52
+ tzdata==2025.2
53
+ urllib3==2.5.0
54
+ uvicorn==0.35.0
55
+ websockets==15.0.1