Mati – Soko https://sokosolutions.com Innovation & Talent Tue, 10 Oct 2023 00:46:56 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.1 https://sokosolutions.com/wp-content/uploads/2023/02/cropped-sokofavicon-32x32.png Mati – Soko https://sokosolutions.com 32 32 Corpus Analyzer https://sokosolutions.com/2023/09/28/corpus-analyzer/ Thu, 28 Sep 2023 23:58:37 +0000 https://sokosolutions.com/?p=1562
large language models

Corpus analyzer

The goal: Identify and summarize the contents of a set of PDF documents and present them in a conversational format.

Technical procedure

1. Document preprocessing:

Text is extracted from each file uploaded, skipping any embedded media on the document.

2. Splitting:

Text content is splitted in smaller chunks in order to keep meaningful pieces of information that fit within the context window of the language model.

3. Document summarization:

Every set of chunks is recursively summarized using GPT-3.5 Turbo. A prefix stating that it is a summary is prepended. 

4. Vectorization:

Each chunk is vectorized by using OpenAI’s embeddings to get a numerical representation of the content that condenses the semantic and its main keywords.

 

5. Storage:

Every chunk and their summaries are stored in Chroma. This vector database makes it easier to find similar content to a query.

6. Clusterization tool:

Summaries are clustered to find common topics between them. First features are obtained from text by using term frequency – inverse document frequency, then truncated singular value decomposition is applied to reduce the dimension. The algorithm used for clustering is DBSCAN. Once clusters are detected, a brief summary is built for each group emphasizing their similarities. Alongside with the descriptions, the tool reports the amount of documents and the document names present per category.

7. Agent:

A conversational agent is built based on GPT-4 with access to the vector db and a clusterization tool. It can formulate many internal questions/queries to compose an answer for a user question.

User instructions

1. Document upload:

The user can upload the PDF documents to analyze into the “Input” box. If needed, more documents can be uploaded after the previous upload has been completed.

2. Chatbot interaction:

After the documents are uploaded, the user can formulate questions about the documents in natural language. At any time the conversation can be cleared by the user, and optionally the files can also be cleared to start from the beginning.

Download the example document

Documents certifying the establishment of companies in Chile, car financing promotions and resumes

Click here to upload the documents

Questions

  1. What are these documents about?
  2. How can these documents be grouped? Make a detailed list
  3. Reduce the amount of categories to three
  4. Which file names belong to each of these categories?
  5. What is the price for each car model offered?
  6. Which model has the lowest interest rate?
  7. What are the main differences between the developers?
  8. Make a brief description of Chahuan y Filippi Limitada
  9. Which Chilean company has the largest capital?

]]>
Web research https://sokosolutions.com/2023/09/28/web-research/ Thu, 28 Sep 2023 23:34:14 +0000 https://sokosolutions.com/?p=1550
large language models

Web research

The goal: Conduct research and present reports on the companies specified by the user in a table format. The information provided by the tool includes: website address, company logo, funding, annual revenue, most stared github repository and a summary of their activities.

Technical procedure

1. Context retrieval:

The app performs multiple google searches over specialized websites in order to get the appropriate data.

2. Document summarization:

Summarize every chunk of the webpage content using gpt-3.5 turbo, then all the summaries are combined into the final summary.

3. Webpage rendering:

Each web page it’s rendered using selenium to avoid losing information on javascript rendered pages.

4. Data agents:

There are 7 agents specialized in obtaining each piece of information (e.g. summary, logo, annual revenue, etc.). Each of these is capable of doing Google searches, rendering web pages and deducing in one or more iterations what the correct answer is.

5. Generate response:

The user’s question and the retrieved context are sent as input to the GPT model to generate a response in natural language based on this input. The generated response is presented to the user through the chatbot interface.

User instructions

1. Chatbot interaction:

The user can provide company names to the chatbot interface and the app will display all the information it can get in a table format. The user can also specify what data field he needs.

2. Batch request:

The user can provide an email and a list of companies to process. After the process finished the results are sent to the provided email address

Questions

  1. Openai
  2. Openai, Flair, Facet ai
  3. Give me the annual revenue of Openai, Flair, Facet ai
  4. Give me the Logo and funding of Openai, Flair, Facet ai

]]>
Chat with your data https://sokosolutions.com/2023/09/28/chat-with-your-data-2/ Thu, 28 Sep 2023 22:55:01 +0000 https://sokosolutions.com/?p=1537
large language models

Chat with your data

Provide users with a convenient and interactive way to access information within PDF documents

Technical procedure

Document preprocessing: When a user uploads a PDF document, the app preprocesses it to extract and structure the text content.

Chunking: The text content from the document is divided into smaller, manageable chunks. The chunk size multiplied by the number of relevant chunks selected should not exceed the maximum context window supported by the underlying language model (GPT-3.5)

Vectorization: Each chunk of text is converted into a numerical vector representation with Word Embeddings techniques. These vectors capture the semantic meaning of the text and allow for efficient searching and retrieval.

Storage: The vector representations of the text chunks are stored in a vector database. This database serves as a repository of contextual information from the documents.

Context retrieval: The app performs a similarity search within the database of vectorized documents to find the most relevant chunks based on the vector representation of the user’s question. These retrieved chunks serve as context for generating a relevant response.

Chat history: The app is a chatbot, therefore it saves the chat history and uses it to generate a standalone new question based on it. 

Generate response: The user’s question and the retrieved context are sent as input to the GPT model to generate a response in natural language based on this input. The generated response is presented to the user through the chatbot interface.

User instructions

1. Document upload:

The user begins by uploading one or more PDF documents into the app. These documents contain the information the user wants to access and query.

2. Chatbot interaction:

After the documents are uploaded, the user interacts with the chatbot interface provided by the app. The user can type questions in natural language to the chatbot. The user can continue to ask questions and receive responses, creating an iterative conversation with the chatbot.

Download the example document

Click here to upload the document

Questions

  1. What is the full name of the notary public who certified this document?
  2. Who are the individuals involved in establishing the 'CHAHUAN Y FILIPPI LIMITADA' company?
  3. What is the registered capital of 'CHAHUAN Y FILIPPI LIMITADA' and how was it contributed?
  4. What is the stated business objective or purpose of the company?
  5. Where is the registered office of the company?
  6. What is the extent of liability for the partners in this Limited Liability Company (Sociedad de Responsabilidad Limitada)?
  7. Who can administrate, represent, and use the company's business name according to the document?
  8. Is there a specified term for the existence of 'CHAHUAN Y FILIPPI LIMITADA'?
  9. What is the date of the document's certification?

]]>
Harness the Power of AI for Statutes with Cognitive Data Capture https://sokosolutions.com/2023/09/28/harness-the-power-of-ai-for-statutes-with-cognitive-data-capture/ https://sokosolutions.com/2023/09/28/harness-the-power-of-ai-for-statutes-with-cognitive-data-capture/#respond Thu, 28 Sep 2023 21:43:04 +0000 https://sokosolutions.com/?p=1535
data capture

Harness the Power of AI for Statutes with Our Cognitive Data Capture Solution

Soko helps clients stay ahead of financial crimes with their participation reporting service. Their expert team focuses on reducing fraud, criminal activity, and money laundering. With this service, clients can quickly access a person’s involvement in other companies, gaining an advantage in preventing financial crimes.

Technologies:
Algorithms: Transformers, Embeddings, TL-GAN, GAN-based noise, YOLO, Faster R-CNN, NER (Named entity recognition), LSTM
Libraries: Tensorflow, AsanteOCR, Camelot, QR detection, OpenCV, spaCy
Development: Stack MEAN (Mongo, Express, Angular, Node), FastAPI
work-detail2.jpg

The Challenge

A team of experts in compliance, technology, risk and data management has the goal of reducing financial crimes and that its clients can protect themselves from fraudsters, criminals, terrorists and money launderers.

For this purpose, the client offers a company participation reporting service. Given the RUT of a natural or legal person, a report is prepared that shows all the companies of which it is (or was) a part and the percentage of participation that corresponds to said person.

The company has a team of 30 people who are in charge of reviewing the history of commercial statutes published in the Official Gazette of Chile and uploading them manually in a web form designed for this purpose. This involves a very repetitive job, subject to errors due to the lexical complexity with which notaries write these documents and entails an inordinate amount of time, since it requires processing the history of 3 million commercial statutes.

The process of drilling in mining consists of obtaining a soil sample by diamond drilling. These samples, which easily reach thousands of feet, are placed in trays intended for this purpose, tabulated and high-resolution photographs are taken. The geologist visually detects and counts fractures, classifying them as natural or induced, depending on whether they are real fractures existing in the earth layers or were caused by drilling or moving the samples. 

The Solution

The solution proposed by the Mootech team includes, first of all, the training of multiple machine learning models that involved manually placing labels on each of the existing entities, in a total of 1,000 corporate bylaws.

The project was divided into 3 stages according to the type of document in statutes of creation, statutes of modification and statutes of dissolution so that our data science team could concentrate on specific models.

The development team carried out the implementation of a web application that presents users with the corporate bylaws with the respective entities detected that did not approve the automatic validations. Then the user will be in charge of reviewing them (or updating them if necessary) and thus ensure that erroneous data is not inserted in the database.

Our data scientists selected semantic segmentation algorithms for image processing and model training with the previously labeled data.

CRISP-DM methodology was used and at each completed iteration benchmarking was performed with different experts (geologists) to feed back the model with new training.

Our solution also included the development of a web service to enable interoperability of the proposed system, i.e., to be consumed and integrated by the client entity’s systems. Additionally, we developed scalability elements of the solution through automation and scalability practices known as MLOps, providing capabilities for continuous model retraining and identifying issues that could affect the solution in a production environment.

results

What we achieved

Thanks to the model implemented by Mototech, the client was able to process the history of 3 million corporate statutes in 3 months of execution. 70% of the documents were approved automatically, without requiring manual intervention.

30% of the documents that followed the manual process showed an average statistic of time required for validation or update of 1 minute per document showing the recognized entities.

Geologists have a visual tool that preloads (in less than a second) all the fractures detected in an image and their task is reduced to verify this information in an agile and efficient way.

In addition to the reduction of analysis time, the introduction of the prototype reduced the seniority level required for this task and collaborated in the process of unification of criteria for fracture selection.

In real-world testing, system users report that response capacity and quality of analysis improve significantly, reducing the average time a claim spends waiting to be classified from a week to less than 5 seconds, resulting in greater efficiency in the management process and financial market supervision in general.

]]>
https://sokosolutions.com/2023/09/28/harness-the-power-of-ai-for-statutes-with-cognitive-data-capture/feed/ 0
A Journey to the Top of the Cloud: SURA’s amazing success in its migration in just 6 months. https://sokosolutions.com/2023/08/30/a-journey-to-the-top-of-the-cloud-suras-amazing-success-in-its-migration-in-just-6-months/ https://sokosolutions.com/2023/08/30/a-journey-to-the-top-of-the-cloud-suras-amazing-success-in-its-migration-in-just-6-months/#respond Wed, 30 Aug 2023 00:16:08 +0000 https://sokosolutions.com/?p=1231

A Journey to the Top of the Cloud:
SURA’s amazing success in its migration in just 6 months

Search, process and analysis of high-volume data in real time with the Soko Solutions Big Data platform.

The Challenge

SURA was operating on local servers (on-premise), which limited its scalability and availability. The company decided to migrate to the native cloud as part of a corporate decision, with the aim of improving efficiency, reducing opera- tional costs and ensuring the adaptability of its services. In addition, they sought to eliminate dependency on WSO2’s costly and highly specialized platform.

The Solution

Our team of experts, Azure Cloud Platform Architects, WSO2 experts, developers and testers, developed a “Specialized Migrator” from WSO2 to Node.js, integrat- ed to Microsoft Azure’s Logics Apps, Azure Functions and Serverless AKS frameworks in record time.

This strategy not only allowed to reduce the entire migration process estimated in 24 months to only 6 months including a rigorous performance testing, but also allows SURA to have a modern and high performance “Flexible Integra- tion” infrastructure with a strong orientation to “Open Insurance”.

Results

Thanks to our solution, SURA achieved significant benefits:

Time and cost savings

The migration to the cloud was completed in only 6 months, which meant savings of more than two years of what a project of this magnitude would take. This change allowed us to improve Time To Market. In addition, by using cloud-native services, there were significant savings in virtual machines and infrastructure costs were reduced by 70%.

High performance and scalability

The cloud-native application developed in Node.js demonstrated extremely high performance, thanks to the best migration practices implemented. This allowed SURA to process large volumes of transactions in an efficient manner, ensuring greater growth capacity for the company.

Expertise

Our team of architects and developers provided SURA with in-depth knowledge of modern architecture and tools, which guaranteed a successful implementation and optimal management of the platform through performance testing in the cloud.

 
In addition to the migration, we provided additional services to SURA, such as architecture, DevOps and security, to ensure that the company got a complete and customized solution. 
]]>
https://sokosolutions.com/2023/08/30/a-journey-to-the-top-of-the-cloud-suras-amazing-success-in-its-migration-in-just-6-months/feed/ 0
FEMSA. The Strategic Path to Automation https://sokosolutions.com/2023/08/29/the-strategic-path-to-automation/ https://sokosolutions.com/2023/08/29/the-strategic-path-to-automation/#respond Tue, 29 Aug 2023 23:55:56 +0000 https://sokosolutions.com/?p=1221

FEMSA:
The Strategic Path to Automation

The Challenge

Robotic Process Automation (RPA) aims to revolutionize manual, repetitive, and inefficient processes by reducing costs and minimizing errors. However, it is crucial for businesses to avoid isolated RPA initiatives and embark on their automation journey with a well-defined goal and strategic plan in place.

Our Approach

We specialize in identifying automatable processes and developing software robots to execute repetitive, rule-based tasks with efficiency and precision.

Initial exploration

We examine all automation possibilities, focusing on the company's desired areas for automation and their associated processes, including cost, resource allocation, time, effort, and complexity

Proof of Concept Evaluation and Subsequent Proposal

We assess the feasibility of the automation project and present a comprehensive work proposal.

Operational Model Implementation

We establish a governance model to manage the automated processes effectively.

Our approach is grounded in agile methodologies and client-agreed sprints, ensuring rapid iterations and delivery of impactful results. We offer end-to- end support, equipping you with tools, training, and ongoing consultation to efficiently manage your automated processes in compliance with established standards and regulations

Efficiency - Cost Reduction - Improved Accuracy - Flexibility and Scalability

FEMSA, Latin America’s leading beverage and retail multinational, has partnered with us to integrate RPA technology into its operations. This collaboration enhances efficiency by automating tasks such as inventory management, financial reporting, and customer service, resulting in reduced costs and improved customer experiences. Together, we aim to optimize a minimum of 180 processes.

]]>
https://sokosolutions.com/2023/08/29/the-strategic-path-to-automation/feed/ 0
Chat with your data https://sokosolutions.com/2023/08/28/chat-with-your-data/ https://sokosolutions.com/2023/08/28/chat-with-your-data/#respond Mon, 28 Aug 2023 20:23:41 +0000 https://sokosolutions.com/?p=934

Chat with your data

Our goal
Provide users with a convenient and interactive way to access information within PDF documents.

Technical procedure

AFAB Lab Resources provides quality used laboratory equipment, supplies and services. AFAB Lab Resources, LLC is a wholesale and service company providing laboratory equipment services and logistics, including wholesale laboratory equipment sales, settlement assistance, service contracts, and equipment purchase and relocation services to the biotechnology and pharmaceutical industry.

The Problem

Due to the fact that the current processes are mostly manual, the main problem and objective of the project was to improve the efficient use of time in routinary tasks. On the other hand, the knowledge of the business was concentrated in two people; processes independent of people are required, taking into account security aspects and access permissions.

The Solution

AFAB is working in a constant iteration process of development using agile methodology with Mototech as a solution partner, that allows to adjust the system according the lessons learned during their inside process discovery journey. At this moment,a first-level inventory management module was developed enabling AFAB to perform all their real life processes easy and fast into the system, with a complete tracking and traceability of each movement done in their warehouses. The project is still ongoing.

Planned Benefits

Prior to solution implementation, there was not a system to perform the daily activities and keep tracking of all the movements of sales and purchase. Also, manual work with focus on trying to trace the inventory consumed a lot of time and there was not standarized process.

]]>
https://sokosolutions.com/2023/08/28/chat-with-your-data/feed/ 0