News Aggregator


Apache Kafka + Flink + Snowflake: Cost-Efficient Analytics and Data Governance

Aggregated on: 2024-08-10 15:07:54

Snowflake is a leading cloud data warehouse and transitions into a data cloud that enables various use cases. The major drawback of this evolution is the significantly growing cost of the data processing. This blog post explores how data streaming with Apache Kafka and Apache Flink enables a "shift left architecture" where business teams can reduce cost, provide better data quality, and process data more efficiently. The real-time capabilities and unification of transactional and analytical workloads using Apache Iceberg's open table format enable new use cases and a best-of-breed approach without a vendor lock-in and the choice of various analytical query engines like Dremio, Starburst, Databricks, Amazon Athena, Google BigQuery, or Apache Flink. Snowflake and Apache Kafka Snowflake is a leading cloud-native data warehouse. Its usability and scalability made it a prevalent data platform in thousands of companies. This blog series explores different data integration and ingestion options, including traditional ETL/iPaaS and data streaming with Apache Kafka. The discussion covers why point-to-point Zero-ETL is only a short-term win, why Reverse ETL is an anti-pattern for real-time use cases, and when a Kappa Architecture and shifting data processing “to the left” into the streaming layer helps to build transactional and analytical real-time and batch use cases in a reliable and cost-efficient way.

View more...

Demystifying the Magic: A Look Inside the Algorithms of Speech Recognition

Aggregated on: 2024-08-09 20:07:54

It seems every commercial device now features some implementation of, or an attempt at, speech recognition. From cross-platform voice assistants to transcription services and accessibility tools, and more recently a differentiator for LLMs — dictation has become an everyday user interface. With the market size of voice-user interfaces (VUI) projected to grow at a CAGR of 23.39% from 2023 to 2028, we can expect many more tech-first companies to adopt it. But how well do you understand the technology? Let's start by dissecting and defining the most common technologies that go into making speech recognition possible.

View more...

Why I Use RTK Query for API Calls in React

Aggregated on: 2024-08-09 18:52:54

The RTK Query part of the Redux Essentials tutorial is phenomenal, but since it’s part of a much larger suite of documentation, I feel like the gem that is RTK Query is getting lost. What Is Redux? Many people think of Redux as a state management library, which it is. To them, the main value of Redux is that it makes it possible to access (and change) the application state from anywhere in the application. This misses the point of using something like Redux, so let’s zoom out a bit and take another look.

View more...

Use Mistral AI To Build Generative AI Applications With Go

Aggregated on: 2024-08-09 17:07:53

Mistral AI offers models with varying characteristics across performance, cost, and more: Mistral 7B: The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration Mixtral 8x7B: A sparse mixture of experts model Mistral Large: Ideal for complex tasks that require large reasoning capabilities or are highly specialized (Synthetic Text Generation, Code Generation, RAG, or Agents) Let's walk through how to use these Mistral AI models on Amazon Bedrock with Go, and in the process, also get a better understanding of its prompt tokens.

View more...

Content Detection Technologies in Data Loss Prevention (DLP) Products

Aggregated on: 2024-08-09 15:52:53

Having worked with enterprise customers for a decade, I still see potential gaps in data protection. This article addresses the key content detection technologies needed in a Data Loss Prevention (DLP) product that developers need to focus on while developing a first-class solution. First, let’s look at a brief overview of the functionalities of a DLP product before diving into detection. Functionalities of a Data Loss Prevention Product The primary functionalities of a DLP product are policy enforcement, data monitoring, sensitive data loss prevention, and incident remediation. Policy enforcement allows security administrators to create policies and apply them to specific channels or enforcement points. These enforcement points include email, network traffic interceptors, endpoints (including BYOD), cloud applications, and data storage repositories. Sensitive data monitoring focuses on protecting critical data from leaking out of the organization's control, ensuring business continuity. Incident remediation may involve restoring data with proper access permissions, data encryption, blocking suspicious transfers, and more.

View more...

Connecting ChatGPT to Code Review Made Easy

Aggregated on: 2024-08-09 14:07:53

The era of artificial intelligence is already already in bloom. Everyone working in IT is already familiar with our "new best friend" for development — AI. Working as a DevOps Engineer at Innovecs, I’d like to share one of my latest findings. Concept Would you like every pull/merge request to be checked by ChatGPT-4 first and then by you? Do you want instant feedback on code changes before your colleagues see them? How about detecting who committed confidential data or API keys and where with the ability to tag the "culprit" for correction immediately? We’re perfectly aware that GPT can generate code quite well. . . but it turns out it can review it just as smoothly! I will immediately show how this works in practice (parts of the code are blurred to avoid showing too much).

View more...

The Case for Working on Non-Glamorous Migration Projects

Aggregated on: 2024-08-08 23:07:53

In my 13 years of engineering experience, I saw many people make career decisions based on the opportunity to work on a brand-new service. There is nothing wrong with that decision. However, today we are going to make a contradictory case of working on boring migration projects. What I did not realize early on in my career was that most of my foundational software development learning came from projects that were migration projects — e.g., migrating an underlying data store to another cloud-based technology or deprecating a monolithic service in favor of new microservices, etc.  This is because migrations are inherently hard: you are forced to meet, if not exceed, an existing bar on availability, scale, latency, and customer experience which was built and honed over the years by multiple engineers. You won’t face those constraints on a brand-new system because you are free to define them. Not only that, no matter how thorough you are with migrations, there will be hidden skeletons in the closet to deal with when you switch over to new parts of the system (Check out this interesting article on how Doordash’s migration from Int to BigInt for a database field was fraught with blockers). 

View more...

Batch vs. Real-Time Processing: Understanding the Differences

Aggregated on: 2024-08-08 21:37:53

The decision between batch and real-time processing is a critical one, shaping the design, architecture, and success of our data pipelines. While both methods aim to extract valuable insights from data, they differ significantly in their execution, capabilities, and use cases. Understanding the key distinctions between these two processing paradigms is crucial for organizations to make informed decisions and harness the full potential of their data.  Key definitions can be summarized as follows:

View more...

Apache Flink 101: A Guide for Developers

Aggregated on: 2024-08-08 20:52:53

In recent years, Apache Flink has established itself as the de facto standard for real-time stream processing. Stream processing is a paradigm for system building that treats event streams (sequences of events in time) as its most essential building block. A stream processor, such as Flink, consumes input streams produced by event sources and produces output streams that are consumed by sinks (the sinks store results and make them available for further processing). Household names like Amazon, Netflix, and Uber rely on Flink to power data pipelines running at tremendous scale at the heart of their businesses, but Flink also plays a key role in many smaller companies with similar requirements for being able to react quickly to critical business events.

View more...

How To Create a CRUD Application in Less Than 15 Minutes

Aggregated on: 2024-08-08 18:07:53

CRUD applications form the backbone of most software projects today. If you're reading this article, chances are your project encountered some challenges, you’re seeking a faster way to accomplish this task, or you are looking for a Java framework to start with. You're not alone.  With the tech world constantly evolving, especially with tighter budgets, there's a noticeable shift towards frameworks that bring everything under one roof to reduce the need for oversized teams. 

View more...

Running PyTorch on GPUs

Aggregated on: 2024-08-08 16:07:53

Running an AI workload on a GPU machine requires the installation of kernel drivers and user space libraries from GPU vendors such as AMD and NVIDIA. Once the driver and software are installed, to use AI frameworks such as PyTorch and TensorFlow, one needs to use the proper framework built against the GPU target. Usually, the AI applications run on top of popular AI frameworks and as such hide the tedious installation steps. This article highlights the importance of the hardware, driver, software, and frameworks for running AI applications or workloads. This article deals with the Linux operating system, ROCm software stack for AMD GPU, CUDA software stack for NVIDIA GPU, and PyTorch for AI frameworks. Docker plays a critical part in bringing up the entire stack allowing the launch of various workloads in parallel.

View more...

JavaScript Frameworks: The Past, the Present, and the Future

Aggregated on: 2024-08-08 15:07:53

When we talk about web development, we cannot help but mention JavaScript. Throughout the past several decades, JavaScript frameworks have been the backbone of web development, defining its direction. The capabilities of JavaScript tools have been steadily growing, enabling the creation of faster, more complex, and more efficient websites. This evolution has made a huge leap from jQuery to React, Angular, and Vue.js. We will look at the major milestones in the evolution of the JavaScript framework that have defined web development as we know it today. The Early Days: jQuery and Its Impact jQuery was created in 2005 by developer John Resig, who set out on a journey to realize an audacious idea for the time being: making JavaScript code writing fun. To achieve this daring goal, he stripped common and repetitive tasks of excessive markup and made them short and understandable. This simple recipe helped him create the most popular JavaScript library in the history of the internet.

View more...

How To Scale RAG and Build More Accurate LLMs

Aggregated on: 2024-08-08 14:07:53

Retrieval augmented generation (RAG) has emerged as a leading pattern to combat hallucinations and other inaccuracies that affect large language model content generation. However, RAG needs the right data architecture around it to scale effectively and efficiently. A data streaming approach grounds the optimal architecture for supplying LLMs with large volumes of continuously enriched, trustworthy data to generate accurate results. This approach also allows data and application teams to work and scale independently to accelerate innovation. Foundational LLMs like GPT and Llama are trained on vast amounts of data and can often generate reasonable responses about a broad range of topics, but do generate erroneous content. As Forrester noted recently, public LLMs “regularly produce results that are irrelevant or flat wrong,” because their training data is weighted toward publicly available internet data. In addition, these foundational LLMs are completely blind to the corporate data locked away in customer databases, ERP systems, corporate Wikis, and other internal data sources. This hidden data must be leveraged to improve accuracy and unlock real business value.

View more...

Leveraging Snowflake’s AI/ML Capabilities for Anomaly Detection

Aggregated on: 2024-08-08 13:22:53

Anomaly detection is the process of identifying the data deviation from the expected results in a time-series data. This deviation can have a huge impact on forecasting models if not identified before the model creation. Snowflake Cortex AL/ML suite helps you train the models to spot and correct these outliers in order to help improve the quality of your results. Detecting outliers also helps in identifying the source of the deviations in processes. Anomaly detection works with both single and multi-series data. Multi-series data represents multiple independent threads of events. For example, if you have sales data for multiple stores, each store’s sales can be checked separately by a single model based on the store identifier. These outliers can be detected in time-series data using the Snowflake built-in class SNOWFLAKE.ML.ANOMALY_DETECTION.

View more...

Semi-Supervised Learning: How To Overcome the Lack of Labels

Aggregated on: 2024-08-07 21:07:53

All successfully implemented machine learning models are backed by at least two strong components: data and model. In my discussions with ML engineers, I heard many times that, instead of spending a significant amount of time on data preparation, including labeling for supervised learning, they would rather spend their time on model development. When it comes to most problems, labeling huge amounts of data is way more difficult than obtaining it in the first place. Unlabeled data fails to provide the desired accuracy during training, and labeling huge datasets for supervised learning can be time-consuming and expensive. What if the data labeling budget was limited? What data should be labeled first? These are just some of the daunting questions facing ML engineers who would rather be doing productive work instead.

View more...

10 Kubernetes Cost Optimization Techniques

Aggregated on: 2024-08-07 20:07:53

These are 10 strategies for reducing Kubernetes costs. We’ve split them into pre-deployment, post-deployment, and ongoing cost optimization techniques to help people at the beginning and middle of their cloud journeys, as well as those who have fully adopted the cloud and are just looking for a few extra pointers. So, let’s get started.

View more...

Docker vs. Podman: Exploring Container Technologies for Modern Web Development

Aggregated on: 2024-08-07 19:07:52

Among the most often used containerizing technologies in the realm of software development are Docker and Podman. Examining their use cases, benefits, and limitations, this article offers a thorough comparison of Docker and Podman. We will also go over useful cases of deploying web apps utilizing both technologies, stressing important commands and factors for producing container images. Introduction Containerization has become an essential technique for creating, transporting, and executing applications with unmatched uniformity across various computer environments. Docker, a pioneer in this field, has transformed software development techniques by introducing developers to the capabilities and adaptability of containers. This technology employs containerization to package an application and all its necessary components into a self-contained entity. This provides consistent functionality regardless of variations in development, staging, and production environments.

View more...

How To Check and Update Newer Versions for Dependencies in Maven Projects

Aggregated on: 2024-08-07 18:07:52

With the passing of time, new versions of the dependencies are released into the market. We need to update the respective dependencies versions in the project as these versions have new changes and fixes for the security vulnerabilities. It is better to update the dependencies frequently in the project.  Now arises the question:

View more...

How You Can Avoid a CrowdStrike Fiasco

Aggregated on: 2024-08-07 17:07:52

By now we've all heard about —  or been affected by — the CrowdStrike fiasco. If you haven't, here's a quick recap. An update to the CrowdStrike Falcon platform, pushed on a Friday afternoon, caused computers to crash and be unbootable. The update was pushed to all customers at once, and the only way to recover was to boot into "Safe Mode" and uninstall the update. That often required direct physical access to the affected computer, making recovery times even longer.

View more...

Introduction to Salesforce Batch Apex [Video]

Aggregated on: 2024-08-07 16:07:52

Salesforce Batch Apex is a powerful tool for handling large data volumes and complex data processing tasks asynchronously. This tutorial will walk you through the core concepts and practical applications of Batch Apex in Salesforce, including the structure of an Apex batch class, writing unit tests for batch classes, scheduling batch classes, running batch classes ad hoc for testing, and understanding Batch Apex limits. Structure of an Apex Batch Class An Apex Batch class in Salesforce must implement the Database.Batchable interface. This interface requires the implementation of three methods:

View more...

SQL Interview Preparation Series: Mastering Questions and Answers Quickly

Aggregated on: 2024-08-07 15:07:52

Welcome to the lesson of our "SQL Interview Preparation Series: Mastering Questions and Answers Quickly!" Throughout this series, we aim to assist you in getting ready, for SQL interviews by delving into different topics. Today we're delving into the core variances between SQL and NoSQL databases, a subject for any data focused job interview. Understanding SQL and NoSQL Relational databases, commonly referred to as SQL databases, are crafted to handle data. They adhere to a predefined schema, which makes them well-suited for situations where data integrity and consistency are crucial. On the other side, NoSQL databases offer flexibility and scalability by managing data and adapting to dynamic rapidly changing information. They find usage in web applications and social media platforms.

View more...

Enhancing Java Application Logging: A Comprehensive Guide

Aggregated on: 2024-08-07 14:52:53

Logging is crucial for monitoring and debugging Java applications, particularly in production environments. It provides valuable insights into application behavior, aids in issue diagnosis, and ensures smooth operation. This article will walk you through creating effective logs in Java applications, emphasizing three key aspects. Logging Good Information Creating Trackable Logs Ensuring Security and Avoiding Data Breaches We will use the java.util.logging package for demonstration, but these principles apply to any logging framework.

View more...

Enhancing Agile Product Development With AI and LLMs

Aggregated on: 2024-08-07 14:07:52

During my 10+ years of experience in Agile product development, I have seen the difficulties of meeting the rapid requirements of the digital market. Manual procedures can slow down highly flexible software engineering and delivery teams, resulting in missed chances and postponed launches.  With AI and Large Language Models (LLMs) becoming more prevalent, we are on the verge of a major change. Gartner points out a 25% increase in project success rates for those using predictive analytics (Gartner, 2021). These technologies are changing the way agile product development is optimized - by automating tasks, improving decision-making, and forecasting future trends. As stated in a report from McKinsey, companies using AI experience a 20% decrease in project costs (McKinsey & Company, 2023).

View more...

How To Create and Run A Job In Jenkins Using Jenkins Freestyle Project

Aggregated on: 2024-08-07 13:52:52

As per the official Jenkins wiki information, a Jenkins freestyle project is a typical build job or task. This may be as simple as building or packaging an application, running tests, building or sending a report, or even merely running a few commands. Collating data for tests can also be done by Jenkins. For instance, a real-world scenario could involve Jenkins allowing you to submit reports to log management at any specified stage concerning management, which may include details about artifacts or shipping application logs. In this Jenkins tutorial, we will dive deeper into how to create a job in Jenkins and eventually, a Jenkins freestyle project. Let’s find out more about Jenkins Build Job before we begin creating a Freestyle Project.

View more...

extended Berkeley Packet Filter (eBPF) for Cloud Computing

Aggregated on: 2024-08-07 13:07:52

eBPF, or extended Berkeley Packet Filter, is a revolutionary technology with origins in the Linux kernel that can run sandboxed programs in a privileged context such as the operating system kernel. eBPF is increasingly being integrated into Kubernetes for various purposes, including network observability, security, and performance monitoring. 

View more...

JMeter Plugin HTTP Simple Table Server (STS) In-Depth

Aggregated on: 2024-08-06 23:07:52

The Need for the Creation of the STS Plugin From a Web Application in Tomcat The idea of having a server to manage the dataset was born during the performance tests of the income tax declaration application of the French Ministry of Public Finance in 2012. The dataset consisted of millions of lines to simulate tens of thousands of people who filled out their income tax return form per hour and there were a dozen injectors to distribute the injection load of a performance shot. The dataset was consumed, that is to say, once the line with the person's information was read or consumed, we could no longer take the person's information again. The management of the dataset in a centralized way had been implemented with a Java web application (war) running in Tomcat. Injectors requesting a row of the dataset from the web application.

View more...

Why You Should Use Buildpacks Over Docker

Aggregated on: 2024-08-06 22:07:52

Docker is the obvious choice for building containers, but there is a catch: writing optimized and secure Dockerfiles and managing a library of them at scale can be a real challenge. In this article, I will explain why you may want to use Cloud Native Buildpacks instead of Docker. Common Issue Using Docker When a company begins using Docker, it typically starts with a simple Dockerfile. However, as more projects require Dockerfiles, the following problematic situation often comes up:

View more...

Building an LLM-Powered Product To Learn the AI Stack: Part 1

Aggregated on: 2024-08-06 21:07:52

Forget what you think you know about AI. It's not just for tech giants and universities with deep pockets and armies of engineers and grad students.  The power to build useful intelligent systems is within your reach. Thanks to incredible advancements in Large Language Models (LLMs) – like the ones powering Gemini and ChatGPT – you can create AI-driven products that used to require a team of engineers. In this series, we'll demystify the process of building LLM-powered applications, starting with a delicious use case: creating a personalized AI meal planner. Our Use Case As an example use case for our journey, we're going to be building a meal-planning app. There’s no shortage of meal plans available online, including those customized for different needs (varying goals, underlying health conditions, etc.). The problem is that it’s often difficult (sometimes impossible) to find guidance tailored specifically for you without hiring a health professional. 

View more...

Handling Schema Versioning and Updates in Event Streaming Platforms Without Schema Registries

Aggregated on: 2024-08-06 20:07:52

In real life, change is constant. As businesses evolve, the technology systems supporting them must also evolve. In many event-driven systems today, event streaming platforms like Kafka, Kinesis, and Event Hubs are crucial components facilitating communication between different technology systems and services. As these systems and services change, the schema of event streaming platform messages needs to be updated. The most common way to address this problem is by using schema registries like Confluent Schema Registry, AWS Glue Schema Registry, and Azure Schema Registry. However, in this article, I am going to discuss a simple solution that does not use any of these schema registries. Although I will use Kafka as an example in this article, this strategy can be applied to any other event streaming platform or messaging queue.

View more...

Not All MFA Is Equal: Lessons From MFA Bypass Attacks

Aggregated on: 2024-08-06 19:07:52

One-time passwords are one of the most relied-on forms of multi-factor authentication (MFA). They’re also failing miserably at keeping simple attacks at bay. Any shared secret a user can unknowingly hand over is a target for cybercriminals, even short-lived TOTPs. Consider this: What if the multi-factor authentication your users rely on couldn’t save your organization from a large-scale account takeover? That’s what happened to an organization using SMS one-time passwords to secure customer accounts. We’ll call the affected organization “Example Company,” or EC for short.

View more...

How To Solve OutOfMemoryError: Java Heap Space

Aggregated on: 2024-08-06 18:07:52

There are 9 types of java.lang.OutOfMemoryError, each signaling a unique memory-related issue within Java applications. Among these, java.lang.OutOfMemoryError: Java heap space stands out as one of the most prevalent and challenging errors developers encounter. In this post, we’ll delve into the root causes behind this error, explore potential solutions, and discuss effective diagnostic methods to troubleshoot this problem. Let’s equip ourselves with the knowledge and tools to conquer this common adversary. JVM Memory Regions To better understand OutOfMemoryError, we first need to understand different JVM Memory regions (see this video clip that gives a good introduction to different JVM memory regions). But in a nutshell, JVM has the following memory regions:

View more...

What Is "Progressive Disclosure" and How Does It Impact Developer Portals?

Aggregated on: 2024-08-06 17:07:52

Progressive disclosure is a UX design pattern that reduces cognitive load by gradually revealing more complex information or features as the user progresses through the UI of a digital product (such as a portal). Why Should Platform Engineers Care? I've encountered the term a few times, but its value really hit home in July during two panels I participated in. First, in the LeadDev panel "How to implement platform engineering at scale," Smruti Patel mentioned this. Then, in a panel that I hosted with Abby Bangser, "When Terraform Met Backstage," our guest, Seve Kim, also mentioned progressive disclosure.

View more...

Harnessing DevOps Potential: Why Backup Is a Missing Piece

Aggregated on: 2024-08-06 16:07:52

We often hear about the importance of developers and the role they play in the success of a business. After all, they are those craftsmen who create the software and apps that make businesses run smoothly.  However, there is one key element of development that is still overlooked – backup. Why? DevOps is constantly focused on delivering the best user experience and making sure the apps they build are bug-free. Yet what if something goes wrong one day or another? Let’s move on step-by-step.

View more...

Oracle: Migrate PDB to Another Database

Aggregated on: 2024-08-06 15:52:52

If you want to migrate or relocate the PDB from one database to another, there are multiple options available in Oracle. Here, we will discuss a few of them. The source and target database can be in a standalone database, RAC, cloud, or autonomous database. After verifying the PDB is on the target, open it for customer access and remove it from the source database based on company policy.  Prerequisites The target database version should same or higher than the source database. The source database should be accessible from the target. The degree of parallelism should be calculated properly. Aware of DBA privileged username/password on the source to create DB link The encryption key is different from the user password. Must have access to the encryption key - it may be either database or tablespace or table level The user in the remote database that the database link connects to, must have the CREATE PLUGGABLE DATABASE privilege. The character set on source and target should be compatible. Known Issues Tablespace may be in a big file. Tablespace may be encrypted. Using a database link, the target database should be able to access the source database. Create an Access Control List (ACL) or whitelist the IP address and port if required. To access the DB link, either enter the source database information in TNSnames.ora or give a full connection string. Stable network connectivity between source and target RMAN jobs may interfere with refreshable cloning. Port from source to target should be opened to copy the files or to access the DB link Remote CDB uses local undo mode. Otherwise, remote PDB may be opened in read-only mode.  Copy/cloning/synchronization between source and target may vary by network traffic and speed.  A few of the approaches are as follows:

View more...

Quick Scrum Gains

Aggregated on: 2024-08-06 15:07:52

TL; DR: Quick Scrum Gains Suppose you are a Scrum Master or Agile Coach. Have you recently been asked to explain your contribution to the organization’s value creation? In other words, does management want to know whether you are pulling your weight or if your salary is an expendable expenditure? This article points to ten quick Scrum gains you can pull off without asking for permission or budget to prove your contribution to your organization’s survival in these challenging times. Ten Quick Scrum Gains You Can Start Tomorrow A few years ago, when money was cheap, valuations high, and profits more than decent, no one questioned the necessity of a Scrum Master or Agile Coach. 

View more...

How To Become a Software Engineer Without a CS Degree: Essential Strategies for Success

Aggregated on: 2024-08-06 14:52:52

Here is how I became a software engineer without a computer science degree. Let me be real with you: coding was hard. I wasted so much time fixing missing semicolons, mismatched brackets, and misspelled variables. Even when the code was compiled, it would not work as expected, and I would spend hours staring at the screen and questioning my life choices. But over time, I picked up some strategies that made coding click for me, and I'm going to share these strategies with you today. Don’t Try To Know Everything The first thing I learned was that as a programmer, you don't need to know everything. When I began my first programming job, I was unfamiliar with Linux commands. When I joined Amazon, I did not fully understand G. At Amazon, my first project was in Python, and I had never written a single line of code in Python. Later, when I joined Google, I could not program in C++, but most of my work was in C++. The point I'm trying to make is that you don't need to know everything; you just need to know where to find it when you need it. But when I was a beginner, I would try to do these 30-40 hour boot camps to learn a programming language, thinking that I was going to learn everything. In reality, you cannot learn everything there is to learn. So, do not wait until you have the right skills to start your project; your project will teach you the skills. Do not wait until you have the confidence to do what you want; the confidence will come when you start doing it.

View more...

Develop With OCI Real-Time Speech Transcription and Oracle Database NL2SQL/Select AI To Speak With Your Data

Aggregated on: 2024-08-06 14:07:52

Speak in your natural language, ask questions about your data, and have the answers returned to you in your natural language as well: that's the objective, and what I'll show in this quick blog and, as always, provide full src repos for as well. I'll leave the use cases up to you from there.   You can learn more about these Oracle Database features here for the free cloud version and here for the free container/image version. Also, you can check out the Develop with Oracle AI and Database Services: Gen, Vision, Speech, Language, and OML workshop, which explains how to create this application and numerous other examples as well as the GitHub repos that contain all the src code.

View more...

Idempotency in Data Pipelines: Overview

Aggregated on: 2024-08-06 13:52:52

Idempotency is an important concept in data engineering, particularly when working with distributed systems or databases. In simple terms, an operation is said to be idempotent if running it multiple times has the same effect as running it once. This can be incredibly useful when dealing with unpredictable network conditions, errors, or other types of unexpected behavior, as it ensures that even if something goes wrong, the system can be brought back to a consistent state by simply running the operation again. In this blog post, we will take a look at some examples of how idempotency can be achieved in data engineering using Python.

View more...

Creating a Command Line Tool With JBang and PicoCLI To Generate Release Notes

Aggregated on: 2024-08-06 13:07:52

Lately, I have been playing with JBang and PicoCLI, and I am pretty amazed at what we can do with these tools. I needed to create a script that would go to a specified repository on GitHub, check the commit range, and verify if any tickets were associated with them. Additionally, I wanted to check if the ticket was accepted and if the commit was approved or not. The idea was to integrate this script along with the CI/CD pipeline. While the traditional approach might involve using bash scripts or Python, as a Java developer, I feel more at home doing this in Java. This is where JBang comes into the picture. And since I want this to be a command-line tool, PicoCLI comes in handy.

View more...

Buh-Bye, Webpack and Node.js; Hello, Rails and Import Maps

Aggregated on: 2024-08-05 23:07:52

I enjoy spending time learning new technologies. However, often the biggest drawback of working with new technologies is the inevitable pain points that come with early adoption. I saw this quite a bit when I was getting up to speed with Web3 in “Moving From Full-Stack Developer To Web3 Pioneer.” As software engineers, we’re accustomed to accepting these early-adopter challenges when giving new tech a test drive. What works best for me is to keep a running list of notes and commands I’ve executed, since seemingly illogical steps don’t remain in my memory.

View more...

Go Serverless: Unleash Next-Gen Computing

Aggregated on: 2024-08-05 22:07:52

In the digital revolution, where bytes fly faster than thoughts, one concept is bringing a paradigm shift in the tech cosmos: serverless computing. The thought of dealing with servers often makes us freak out. Server maintenance, scalability issues, and huge infrastructure costs can all be part of our nightmares. This is where serverless computing can be a game-changer. It aims to virtually save modern-day technology trouble so we can just focus on coding. “Serverless” doesn't literally mean the servers completely vanish. Instead, they are hidden behind the curtains until summoned. Think of it like a magic genie that is always at your beck and call to grant your computing wishes without the hassles of hardware management.

View more...

Scaling Prometheus With Thanos

Aggregated on: 2024-08-05 21:07:52

Observability is a crucial pillar of any application, and monitoring is an essential component of it. Having a well-suited, robust monitoring system is crucial. It can help you detect issues in your application and provide insights once it is deployed. It aids in performance, resource management, and observability. Most importantly, it can help you save costs by identifying issues in your infrastructure. One of the most popular tools in monitoring is Prometheus. It sets a de facto standard with its straightforward and powerful query language PromQL, but it has limitations that make it unsuitable for long-term monitoring. Querying historical metrics in Prometheus is challenging because it is not designed for this purpose. Obtaining a global metrics view in Prometheus can be complex. While Prometheus can scale horizontally with ease on a small scale, it faces challenges when dealing with hundreds of clusters. In such scenarios, Prometheus requires significant disk space to store metrics, typically retaining data for around 15 days. For instance, generating 1TB of metrics per week can lead to increased costs when scaling horizontally, especially with the Horizontal Pod Autoscaler (HPA). Additionally, querying data beyond 15 days without downsampling further escalates these costs.

View more...

Reimagining AI: Ensuring Trust, Security, and Ethical Use

Aggregated on: 2024-08-05 20:07:52

The birth of AI dates back to the 1950s when Alan Turing asked, "Can machines think?" Since then, 73 years have passed, and technological advancements have led to the development of unfathomably intelligent systems that can recreate everything from images and voices to emotions (deep fake). These innovations have greatly benefited professionals in countless fields, be they data engineers, healthcare professionals, or finance personnel. However, this increased convergence of AI within our daily operations has also posed certain challenges and risks, and the assurance of reliable AI systems has become a growing concern nowadays.

View more...

Building an IoT-based Waste Management System: A Software Architect's Guide

Aggregated on: 2024-08-05 19:07:51

The Internet of Things is a network of physical devices. These devices can be anything, like smart bins or home appliances. They have sensors that collect information. They also have software that processes this information. These devices are connected to the internet. This allows them to share the data they collect. For example, a smart bin can tell how full it is and send this information to a cloud platform. We can use IoT to manage waste better. Sensors can gather data about waste levels. This helps in organizing waste collection more efficiently.

View more...

How To Setup OAuth JWT in the Salesforce Connector

Aggregated on: 2024-08-05 18:07:51

In this post, we'll explain all the steps required to connect a Mule application to Salesforce using the Salesforce connector with the OAuth JWT flow. You can also create your own certificate for the OAuth JWT flow with Salesforce or with OpenSSL (signed by a CA or self-signed). Both options are very well explained in the video at the conclusion of the article from Stefano Bernardini, MuleSoft Ambassador. In this post, we’ll be using a self-signed certificate created by Salesforce but, keep in mind, that for production environments, a certificate issued by a Trusted Certificate Authority is always recommended.

View more...

Streaming Data Joins: A Deep Dive Into Real-Time Data Enrichment

Aggregated on: 2024-08-05 17:07:51

Introduction to Data Joins In the world of data, a "join" is like merging information from different sources into a unified result. To do this, it needs a condition – typically a shared column – to link the sources together. Think of it as finding common ground between different datasets. In SQL, these sources are referred to as "tables," and the result of using a JOIN clause is a new table. Fundamentally, traditional (batch) SQL joins operate on static datasets, where you have prior knowledge of the number of rows and the content within the source tables before executing the Join. These join operations are typically simple to implement and computationally efficient. However, the dynamic and unbounded nature of streaming data presents unique challenges for performing joins in near-real-time scenarios.

View more...

Building a To-Do List With MongoDB and Golang

Aggregated on: 2024-08-05 16:07:51

Hi, there! Many have wondered how a simple task sheet or applications that provide such functionality work. In this article, I invite you to consider how you can write your small service in Go in a couple of hours and put everything in a database. Let's start our journey with Golang and MongoDB.

View more...

Free Tier API With Apache APISIX

Aggregated on: 2024-08-05 15:52:52

Lots of service providers offer a free tier of their service. The idea is to let you kick their service's tires freely. If you need to go above the free tier at any point, you'll likely stay on the service and pay. In this day and age, most services are online and accessible via an API. Today, we will implement a free tier with Apache APISIX. A Naive Approach I implemented a free tier in my post, "Evolving Your RESTful APIs: A Step-by-Step Approach," albeit in a very naive way. I copy-pasted the limit-count plugin and added my required logic.

View more...

Finding Your Voice: Navigating Tech as a Solo Female Engineer on Your Team

Aggregated on: 2024-08-05 15:07:51

For most of my career, I have been the only female engineer on my team. You may wonder, what’s so significant about that? As I navigated the tech industry as the only female engineer on my team, I often felt isolated and lonely. The lack of community and a sense of belonging led to a growing imposter syndrome, and unfortunately, many women in tech resonate with this feeling. Throughout my 5+ years of experience as a software engineer, I have realized the importance of having a strategy and a supportive network to navigate this landscape. Here are some of my tips to tackle this head-on:

View more...

Harnessing the Power of AWS Aurora for Scalable and Reliable Databases

Aggregated on: 2024-08-05 14:52:52

In the era of digital transformation, businesses require database solutions that provide scalability and reliability. AWS Aurora, a relational database that supports MySQL and PostgreSQL, has become a popular choice for companies looking for high performance, durability, and cost efficiency. This article delves into the benefits of AWS Aurora and presents a real-life example of how it is used in an online social media platform. Comparison of AWS Aurora: Benefits vs. Challenges Key Benefits Description Challenges Description High Performance and Scalability Aurora's design segregates storage and computing functions, delivering a bandwidth that is five times greater than MySQL and twice that of PostgreSQL. It guarantees consistent performance even during peak traffic periods by utilizing auto-scaling capabilities.

View more...