Custom AI models made in Bangladesh
- 22 hours ago
- 8 min read

In Bangladesh, there's a flourishing community of around 15-20 dedicated AI teams diligently crafting their unique AI products. This explains how artificial intelligence (AI) can accelerate the digital government journey. Cameras are everywhere, in cities, factories, retail stores, and transportation systems, but most video footage still requires human eyes to interpret it. At Kaz Software, we are developing advanced machine learning models in Bangladesh that allow computers to detect objects, recognize patterns, and even interpret actions within live or recorded video feeds. This capability transforms ordinary camera footage into intelligent data that can support law enforcement, smart cities, logistics, and security operations. One of the most interesting examples of this work involves a very familiar vehicle on the streets of Dhaka: the electric rickshaw often jokingly called the “Tesla of Dhaka.”
The challenge: turning video into searchable intelligence
Video surveillance systems generate enormous volumes of footage every day. A single city can produce thousands of hours of video per hour, far more than any human team can monitor effectively. Traditional surveillance systems simply record video. When something important happens, investigators must manually search through footage, often frame by frame, to locate relevant evidence. Modern artificial intelligence and machine learning model development changes this entirely. Instead of passively storing video, intelligent systems can automatically detect vehicles, identify objects and visual features, track movement across frames, and search video archives based on descriptive queries. This is the kind of AI software development in Bangladesh that Kaz Software is actively advancing.
Teaching AI models made in Bangladesh to recognize the “Tesla of Dhaka”
Dhaka’s electric rickshaws have become a distinctive part of the city's transportation ecosystem. While they are easy for humans to recognize, teaching AI models made in Bangladesh to identify them requires careful engineering. The process begins with data collection and annotation. Thousands of images and video clips containing electric rickshaws are gathered from multiple sources. These images are then manually labeled so the model learns what features define the vehicle. Electric rickshaws have several unique characteristics. These include the shape of the passenger cabin, the structure of the rear wheels, the roof and frame configuration, and distinct lighting or body patterns.
A deep learning model, typically based on convolutional neural networks or transformer based vision architectures, is then trained to recognize these patterns across different conditions. The training data intentionally includes day and night footage, rainy and low light conditions, different camera angles, and vehicles that are partially obstructed. Through repeated training cycles, the model learns to reliably detect electric rickshaws in complex urban environments.
Moving beyond vehicles: detecting objects and visual clues
Object detection models become far more powerful when they can recognize combinations of visual features rather than just single objects. For example, investigators may need to locate something very specific within a large video archive, such as the request: find a white van with a red tissue box visible on the dashboard. For a human investigator, this could mean scanning hundreds of hours of footage. For an AI system trained through custom machine learning model development, this type of query becomes possible because the system understands multiple visual signals simultaneously. The model can detect vehicle type and color, interior dashboard objects, motion patterns, and location and timestamp data. By combining these signals, the system can quickly retrieve video segments that match the description.
This effectively turns raw video footage into a searchable visual database.
Detecting actions and events in video streams
Recognizing objects is only part of the challenge. Modern AI systems can also learn to detect actions and behaviors over time. This involves analyzing sequences of frames rather than single images. By studying motion and context, machine learning models can detect patterns such as vehicles stopping in unusual locations, objects being transferred between individuals, suspicious movement patterns, or traffic related incidents. Technically, this requires combining computer vision models with temporal analysis using architectures such as 3D convolutional neural networks, vision transformers with temporal embeddings, or sequence based models that evaluate motion across frames. These systems allow AI to interpret not just what appears in a frame, but what is happening across time.
The architecture behind intelligent video analysis
Building these systems requires a multi layer AI pipeline that processes video streams efficiently and accurately. A typical architecture includes several stages. Video streams are ingested from cameras or stored archives. Frames are extracted and passed through object detection models that identify vehicles, people, and other objects.
Additional models analyze detected objects to classify attributes such as color, shape, and contextual details. The system then stores structured metadata describing each scene, effectively transforming video into a structured dataset of visual events.
Finally, a search layer allows investigators or analysts to query this data using natural language descriptions. Instead of watching hours of video, users can ask questions such as “show all electric rickshaws entering this intersection between 10 PM and midnight” or “find a white van with visible dashboard objects.”
Why custom machine learning models matter
Many generic AI tools exist today, but real world environments, especially in rapidly

evolving cities like Dhaka, require custom machine learning model development.
Vehicles, street layouts, traffic patterns, and camera environments vary significantly between regions. Off the shelf models trained on Western datasets often perform poorly in South Asian urban environments. At Kaz Software, our approach focuses on localized AI model training where datasets are carefully curated to reflect the actual visual conditions of the region. This results in models that are far more accurate and practical for real world deployments.
AI software development in Bangladesh is advancing rapidly
Artificial intelligence research and development in Bangladesh has accelerated significantly over the past decade, driven by the growth of the country’s technology sector, expanding digital infrastructure, and a rapidly increasing pool of software engineers and data scientists. Bangladesh’s ICT industry now contributes billions of dollars annually to the national economy and includes thousands of registered technology firms working in software development, data services, and emerging AI technologies. Universities such as BUET, Dhaka University, and several private institutions are also producing a growing number of graduates specializing in machine learning, computer vision, and data science, strengthening the local talent pipeline.
This momentum is reflected in the expanding adoption of artificial intelligence across sectors including fintech, agriculture, manufacturing, and smart city infrastructure. According to government and industry initiatives such as the “Digital Bangladesh” program and the national AI strategy roadmap, Bangladesh is actively investing in data infrastructure, innovation hubs, and technology exports in order to position itself as a regional center for advanced software development.
Within this ecosystem, Kaz Software is contributing to the advancement of custom AI and machine learning solutions developed locally but applicable globally. By building systems that combine computer vision research, large-scale machine learning model training, and scalable cloud-based AI infrastructure, companies like Kaz Software demonstrate that sophisticated AI platforms can be engineered outside traditional technology hubs. These efforts highlight a broader shift in the global AI landscape, where emerging technology markets such as Bangladesh are increasingly moving beyond outsourcing toward creating original AI products, platforms, and intellectual property.
From footage to insight: the vision behind Omnivisia

The challenge of extracting meaningful information from video data is exactly what inspired another initiative at Kaz Software called Omnivisia. Around the world, millions of cameras capture enormous amounts of video every day across cities, businesses, factories, and infrastructure systems. By some estimates, global surveillance systems alone generate around 2.5 exabytes of video data per day, yet the overwhelming majority of that footage is never meaningfully analyzed. This creates a massive paradox. Organizations spend billions installing cameras, expanding coverage, and storing video data, yet most of the information captured remains effectively invisible because humans simply cannot watch it all. Security operators can only monitor a handful of camera feeds effectively, and manually reviewing hours of footage to find a few seconds of relevant activity can take many hours. Omnivisia aims to solve this problem by transforming video archives into searchable, intelligent data systems. Instead of forcing humans to watch endless recordings, AI models analyze the footage automatically and extract meaningful patterns, objects, events, and behaviors. The result is a system where users can search video in the same way they search the web.
Imagine being able to ask a system questions like “show all vehicles entering this gate between midnight and 2 AM” or “find instances where a delivery truck stopped for more than five minutes.” Instead of manually scrubbing through video timelines, investigators or analysts can instantly retrieve relevant moments from massive video datasets.
This approach reflects a broader shift in how organizations think about visual data. The real value of video is not simply recording events but understanding what is happening inside those recordings. By combining computer vision, machine learning, and scalable AI infrastructure, Omnivisia represents Kaz Software’s effort to transform passive surveillance footage into actionable intelligence. In many ways, the technologies discussed earlier in this article, such as object detection, action recognition, and visual search, form the foundation of systems like Omnivisia. They represent the next stage in the evolution of video systems: from cameras that merely record the world to platforms that can actually understand it.
The future: intelligent cameras and smart cities
As machine learning models become more sophisticated, video systems will evolve into intelligent infrastructure. Instead of passive surveillance cameras, cities will operate networks of smart sensors capable of understanding activity in real time. This technology can support traffic management, public safety operations, urban planning and mobility analysis, and infrastructure monitoring. At Kaz Software, we see custom machine learning model development as a critical part of building that future. From detecting the “Tesla of Dhaka” on busy city streets to enabling powerful visual search capabilities, our work represents a step toward machines that truly understand the visual world. And it shows that AI innovation is happening right here in Bangladesh.
FAQ
What does it mean to build AI models made in Bangladesh?
AI models made in Bangladesh refer to machine learning systems that are designed, trained, and engineered by local software teams using regional datasets and infrastructure. Instead of relying only on imported AI technology, Bangladeshi companies are developing their own computer vision and data analysis systems that solve real problems in local environments such as traffic monitoring, security, logistics, and urban management.
How can AI detect objects and actions in video footage?
AI systems analyze video by breaking it into individual frames and running computer vision models that identify objects such as vehicles, people, and other items in the scene. Additional models track how these objects move over time to understand actions or events. This allows the system to recognize patterns such as vehicles entering restricted areas, objects being left behind, or unusual movement.
Why is it difficult for humans to monitor surveillance video effectively?
Modern cities generate massive amounts of camera footage every day. Security operators can only watch a limited number of screens at once, and reviewing recorded footage manually can take hours. Because of this, most video data is never analyzed in detail. AI systems help by automatically scanning the footage and identifying relevant events so humans only review the important segments.
How can AI search video footage using descriptions like “a white van with a red tissue box”?
Advanced computer vision models can detect many visual attributes at the same time such as vehicle type, color, and smaller objects inside the vehicle. When a user searches for something specific, the system scans the stored metadata created during analysis and retrieves the video segments that match those features. This makes large video archives searchable in a similar way to a search engine.
What role can companies in Bangladesh play in global AI development?
Bangladesh has a rapidly growing technology sector with a large pool of software engineers and increasing expertise in machine learning and data science. As companies develop custom AI platforms, computer vision systems, and data infrastructure locally, they can build solutions that serve both domestic and international markets. This shift allows Bangladesh to move beyond outsourcing and contribute original AI products and research to the global technology ecosystem.



