Data Engineering Resources

DZone's Featured Data Engineering Resources

Creating a Web Project: Caching for Performance Optimization

By Filipp Shcherbanich

CORE

In one of the previous articles on identifying issues in your project, “Creating a Web Project: Key Steps to Identify Issues,” we discussed how to analyze application performance and how collecting metrics can assist in this task. However, identifying a problem is only the first step; action must be taken to resolve it. Caching is arguably one of the most effective and widely used methods for accelerating your application. The principle behind it is that instead of performing complex operations each time, we temporarily save the result of those operations and return it for subsequent similar requests if the inputs have not changed. This way, the application can be sped up, load reduced, and overall stability improved. However, like any tool, caching should be used wisely. Let us see if it can help your particular project and find out what you should do next if the answer is positive. Cache Is King... Or Is It? The general reasons for optimising your application are straightforward. First, the responsiveness of your service will increase, and the user experience will improve. In turn, this can positively influence the product metrics of the system and key indicators such as revenue. Second, the system will become more fault-tolerant and capable of withstanding higher loads. Third, by optimizing your project, you can reduce infrastructure costs. However, it is always important to remember that premature optimization is a bad practice, and such matters must be approached thoughtfully. Every method that improves system performance also increases the complexity of understanding and maintaining it. Therefore, let me emphasize once again: first, learn how to analyze how your system currently works and identify its bottlenecks. Only then should you begin optimization. Caching should be applied only when: Requests to slow data stores or complex operations are frequently repeated with the same parameters.The data is not updated too often.The system's slow performance is causing issues. If you checked all three boxes on the list, let us proceed. The next thing to keep in mind is that you should not overuse caching. By adding it, you also take on tasks related to cache invalidation, application debugging, and increased complexity of understanding the application. Caching can be used in various parts of a system, and it can occur at many levels: client-side, network, server-side, application-level, and more. Each of these levels can have multiple sub-levels of caching. This can somewhat resemble a large onion, with each layer capable of caching data to accelerate the entire system. To fully cover the topic of caching, an entire book might not be enough, but in this article, let’s highlight the key points that a web developer needs to understand when building an application. Client-Side Caching One of the most obvious mechanisms encountered by every internet user daily is client-side caching, for example, in the browser. In such cases, requests do not even reach the server: data is instantly returned from the local cache, providing maximum response speed and reducing server load. Even backend developers must understand how client-side caching works, since the backend can influence its behavior using special HTTP headers. Client-side caching can exist at different levels: in the frontend application's code, the library sending the requests, or the browser itself. So while direct control from the backend is not possible, indirect management is quite feasible. Among the headers that affect caching, the main ones are: Cache-Control, Last-Modified, ETag, and Expires. With these, the server can tell the client which content can be cached and for how long it can be used without a new request. This effectively reduces traffic and improves performance for both the client and the server. Example of caching headers For instance, static files such as styles or JavaScript scripts are often cached. However, there are cases where the cache needs to be forcibly cleared before its expiration. To do this, the technique known as cache busting is used — a special GET parameter is added to the URL (for example, ?v=1234). Changing this parameter forces the client to re-download the file, even if the previous version is still stored in the cache, because even a minor change in the URL is considered a new URL from the web’s perspective. Diagram of browser-server interaction with caching Incidentally, caching is applicable not only to static files; it can also be effectively used for API endpoints to reduce the number of calls to your (or a third-party) server, thereby lowering load and increasing application responsiveness. This is also relevant when interacting between two backends, where one plays the role of the client: in such cases, the number of network requests can be significantly reduced, lightening the infrastructure load. Proper use of HTTP headers allows for cache behavior control even in such scenarios and enhances the overall robustness of the system. Here are the main prerequisites for client-side caching to be effective: Use caching headers (Cache-Control, Last-Modified, ETag, Expires) for both static resources and applicable API methods.When calling APIs from your backend, consider cache headers and cache results where appropriate.Plan a cache invalidation strategy on the client in advance, and if necessary, apply cache busting techniques (e.g., via URL parameters). Network-Level Caching Network-level caching means data is cached between the client and server to reduce the number of server requests and accelerate content delivery. This could involve using a CDN (Content Delivery Network), such as Cloudflare or Fastly. With such tools, caching rules can be configured quite flexibly: any endpoints can be cached for any desired duration, and the cache can be cleared or rules adjusted via API. Additionally, these services offer flexible configuration of DNS record TTLs. Example of cache rule configuration in Cloudflare To maximize the benefits of network-level caching with minimal costs, please keep these points in mind: Use only popular CDN services – this ensures fast and reliable delivery of static resources to users around the world, reducing latency and server load.Configure caching rules flexibly for individual endpoints when necessary – this is especially important for dynamic data with varying update frequencies.Document cache logic and refresh rules – this simplifies maintenance, reduces errors, and ensures transparency across all internal teams.Consider evaluating alternative CDN services aside from the one you’ve selected, but avoid becoming overly dependent on custom features to prevent vendor lock-in.Don’t forget to set a reasonable TTL for DNS records. A long TTL will reduce periodic delays when loading your web project, but changing the IP address may become more difficult, so reduce the TTL in advance when preparing for migrations. Server-Side Caching Server-side caching (infrastructure-level, outside the application code) can also significantly improve performance. In this case, data is cached at the web server level, and the request might not even reach the application’s business logic — the response is returned from the cache. This reduces application load and accelerates processing of repeated requests. There are ready-made tools for this approach, such as Nginx cache, if you use Nginx as your web server. Here’s an example: Nginx http { proxy_cache_path /data/nginx/cache keys_zone=mycache:10m inactive=10m max_size=100m; server { listen 80; server_name shcherbanich.com; location ~ ^/api/ { proxy_pass http://localhost:8000; proxy_cache mycache; proxy_cache_valid 200 5m; proxy_cache_methods GET HEAD; proxy_cache_use_stale error timeout updating; add_header Cache-Control "public, max-age=300"; add_header X-Cache-Status $upstream_cache_status; } } } In this example, caching is configured for API endpoints in Nginx for the GET and HEAD methods, where all requests to paths starting with /api/ are proxied to a local backend (localhost:8000), and successful responses (status code 200) are cached on disk for 5 minutes. The cache is handled by the mycache zone, configured with a 100 MB limit and 10 MB of RAM for metadata. The browser is explicitly informed of the caching capability through the Cache-Control: public, max-age=300 header, while the X-Cache-Status header allows tracking whether the response was retrieved from the cache. In case of errors or timeouts, Nginx will serve whatever is in the cache, even outdated information, if available. Diagram of caching behavior in Nginx Today, all popular web servers offer very flexible caching options for applications, and given their incredible performance, this method can often be used to optimize your service with minimal effort. That’s why I recommend not ignoring this opportunity, but don’t forget to document this behavior, especially if the configuration is stored separately from your project code. Adding caching logic outside the application level may complicate the team’s understanding of how the system works. Also, don’t forget this: Caching can be implemented at the web server level – this is an effective way to reduce system load.API endpoints can also be cached, but this should be done carefully, taking into account the risks of stale data and debugging complexities.Server-side caching allows for serving cached responses even when the backend fails, increasing fault tolerance. Application-Level Caching At the application level, you can cache both individual business logic operations and virtually any data – you have full control. Most often, database query results or responses from external APIs are cached to reduce response times and system load. However, there are also cases where even internal application logic is so resource-intensive that it makes sense to cache it. Diagram of back-end caching In such cases, storage solutions like Redis, Memcached, or Valkey (a Redis 7.2 fork supported by the Linux Foundation) are commonly used — these are high-performance key/value stores. Redis and Valkey, in particular, offer a rich set of features, such as support for structured data types (strings, lists, sets, hashes, sorted sets), pub/sub mechanisms, streams, Lua scripting, and TTL settings for automatic key expiration. This makes them suitable not only for basic caching but also for implementing queues, rate-limiting, and other tasks requiring fast processing and temporary data storage. However, using external stores like Redis is not always necessary. You can implement caching in the application’s memory if it runs continuously in a single process. File-based caching is also possible: if implemented correctly, it can be just as fast, and in some cases even outperform external solutions (especially if the external store is hosted remotely). Nonetheless, such approaches are not scalable and require manual management and maintenance, especially in distributed environments. Please remember: Application-level caching allows you to store the results of resource-hungry operations (database queries, external APIs, calculations), thereby significantly speeding up the application.You have full control over caching logic: conditions, storage format, lifetime, and invalidation methods — app-side caching is your best friend.Plan methods for analyzing cache state, collect metrics, and monitor server resources. Final Thoughts It is important to always take a well-considered approach to caching: careless use can complicate debugging, lead to elusive bugs, and cause memory overflows. When caching, always remember to care about cache invalidation strategies, memory limits for cache storage, and ways to maintain cache consistency. Don’t blindly add caching everywhere — it’s better to do so gradually and only when performance problems arise. Be capable of measuring your system’s performance: even before implementing caching, analyze request volume, memory/CPU usage, and request processing speed. After implementation, add metrics such as the number of items in cache, cache hit rate, eviction rate, number of cache-related errors, and expired rate (items removed due to TTL expiration). We have reviewed the most fundamental method of optimizing your application. In the next part, we will discuss other popular and important optimization techniques. Resources Creating a Web Project: Key Steps to Identify Issues More

The Future of Java and AI: Coding in 2025

By Daniel Oh

CORE

Expanding on the findings of "The State of Coding the Future with Java and AI" survey, this article focuses more on the unique perspective and potential for developers leveraging Quarkus for Java AI. Software development is evolving rapidly, and Java remains a cornerstone for enterprise applications, especially as Artificial Intelligence (AI) reshapes the coding landscape. In 2025, Java developers are at the forefront of this transformation, harnessing AI tools and frameworks like Quarkus to build scalable, cloud-native, and intelligent applications. This article, inspired by insights from "The State of Coding the Future with Java and AI" survey, dives into the state of Java development, the role of AI, and why Quarkus is a game-changer for modern Java workloads. Java’s Enduring Relevance Java’s longevity stems from its portability, robustness, and vibrant ecosystem. With over 30 years of evolution, Java continues to power enterprise backends, cloud-native microservices, and now AI-infused applications. The release of Java 24 and the upcoming Java 25 bring performance optimizations, enhanced concurrency models, and better support for cloud deployments, making Java a top choice for developers. The Java community is thriving, with events like Microsoft JDConf 2025 (April 9-10) showcasing advancements in cloud, AI, and frameworks like Quarkus. These gatherings highlight how Java adapts to modern demands, from microservices to serverless architectures. AI: The New Frontier for Java Developers AI is no longer a niche; it’s a core component of software development. Java developers are integrating AI to enhance applications with capabilities like natural language processing, predictive analytics, and intelligent automation. Tools like GitHub Copilot and Azure AI services are streamlining workflows, enabling developers to write cleaner code faster. AI-assisted coding: GitHub Copilot, integrated into IDEs like IntelliJ IDEA, suggests code snippets, automates testing, and debugs with remarkable accuracy.AI platform integration: Model as a Service allows Java developers to build a generative AI into applications, from chatbots to recommendation engines. Libraries like Quarkus Langchain4j integration simplify integration, enabling rapid prototyping.Custom AI agents: Developers can build autonomous agents for platforms like Model Context Protocol Java SKD using Java, automating tasks like scheduling or data analysis. AI’s impact extends beyond coding assistance. It’s transforming how Java applications process data, optimize performance, and deliver personalized user experiences. Quarkus: Infusing AI into Java Applications LangChain4j, a Java port of the popular LangChain framework, abstracts away the complexity of working with large language models (LLMs). It supports multiple LLM providers like OpenAI, Azure OpenAI, and Cohere, as well as vector databases like Qdrant and Redis. With the Quarkus LangChain4j extension, developers can: Implement Retrieval-Augmented Generation (RAG) to build intelligent search/chat experiences.Use document loaders and embedding models for vector-based retrieval.Compose AI pipelines entirely in Java, leveraging Quarkus’s dependency injection and test frameworks. This allows Java developers to go from idea to production AI experience without needing to leave their comfort zone. Quarkus for Agentic AI Quarkus, combined with the MCP (Model Catalog Protocol) extension, empowers Java developers to build intelligent applications that can easily consume AI models as services. With the Quarkus MCP extension, developers can discover, bind, and invoke foundation models, such as those hosted on Azure Open AI, from within a native Java application using standardized APIs. This enables seamless integration of generative AI capabilities like text summarization, translation, and Q&A directly into Quarkus-based applications, streamlining the development of scalable, cloud-native AI solutions in pure Java. Running Quarkus AI Apps on Azure Azure provides a robust environment for deploying Quarkus-based AI apps. With support for: Azure OpenAI Service for scalable, enterprise-grade LLM access.Azure Container Apps and Azure Kubernetes Service (AKS) for Quarkus-native deployment targets.Qdrant on Azure for scalable vector similarity search.Microsoft Entra ID for secure authentication and authorization. Java developers can build, test, and deploy intelligent applications efficiently, with full CI/CD pipelines and security best practices. For example, a Java-based AI assistant using Quarkus and LangChain4j can be deployed in minutes to Azure Container Apps, accessing Azure OpenAI for LLMs and Qdrant for vector search — all through managed services. Case Study: Azure RAG With Java, LangChain4j, and Quarkus Microsoft has released a reference implementation demonstrating how to build a retrieval-augmented generation (RAG) solution using: Quarkus for the backendLangChain4j for AI integrationAzure OpenAI and Qdrant for the LLM and vector store It showcases document ingestion, semantic search, and intelligent response generation—all within a native Java stack. This not only proves feasibility but also opens a clear path for enterprise developers building similar applications. The Road Ahead Java’s future is bright, with AI and frameworks like Quarkus driving innovation. Developers are building applications that are faster, smarter, and more scalable than ever. By embracing tools like Quarkus, Azure AI, and GitHub Copilot, Java developers are not just keeping pace — they’re shaping the future of software. More Resources Quarkus LangChain4j Deep DiveQuarkus LangChain4j workshopGetting started with your first AI application with QuarkusSurvey on the State of Coding the Future with Java and AI More

Exploring Intercooler.js: Simplify AJAX With HTML Attributes

By Nagappan Subramanian

CORE

How to Merge HTML Documents in Java

By Brian O'Neill

CORE

How To Introduce a New API Quickly Using Quarkus and ChatGPT

By John Vester

CORE

IoT and Cybersecurity: Addressing Data Privacy and Security Challenges

The Internet of Things has shaken up our lives, connecting everything from smart homes to massive industrial systems in a pretty smooth way. Sure, these tech upgrades make our day-to-day so much easier, but they have also brought some real concerns about security and privacy. With billions of IoT devices out there, are we really ready for the growing cybersecurity threats? In this article, we'll dive into the biggest IoT security challenges, the key data privacy risks, and some practical solutions to help keep our sensitive information safe in this ever-expanding digital world. Why Is Cybersecurity Crucial for IoT Devices? The technological advancements concerning the Internet of Things pose some of the biggest threats to consumer, business, and government security. Trillions of networked devices steadily increase the opportunities for cyber-attacks to take place. The rising number of expected 25 billion IoT devices in 2025 makes security improvement more vital than ever. Users, IT teams, and organizations must implement regular updates, strengthen encryption, and execute multi-factor authentication to secure their IoT devices and networks. Understanding IoT Security Requirements IoT security requirements support a strategy that is specific to the industry, business, and network requirements. There needs to be rigorous practice of administrative oversight, regular patches and updates, strong password usage, and a focus on Wi-Fi security to ensure protection. Furthermore, network and device behavior deviations can be monitored to detect malware from an IoT device vulnerability. Network segmentation is the best practice for IoT devices. It isolates vulnerable devices, preventing malware from spreading. Additionally, applying zero-trust network access provides an additional layer of security. It is beneficial to consider securing the technologies with another layer of cloud-based security solutions that also add processing capabilities to devices at the edge. Key Data Privacy Concerns in IoT Many devices integrate smoothly into our homes, offices, and public spaces and collect huge amounts of data. These are invaluable for making life more connected, yet pose significant risks if not properly managed. Data Collection and Use McKinsey reports state that IoT's potential economic impact could be up to $11.1 trillion per year by 2025. This is mainly due to the knowledge derived from data collection from various sources. However, concerns exist regarding data transparency and potential privacy violations due to misuse. This lack of transparency leads to data misuse, which includes unauthorized sharing with advertisers or other third parties, which potentially leads to privacy violations. Data Security Vulnerabilities Many devices possess weak default security settings and inadequate security protocols. It makes them easy targets for cyber-attacks. A study by Armis  states that cybersecurity attack attempts will more than double in 2023. It has increased by 104%, with many devices compromised due to fundamental security flaws like default passwords or unpatched vulnerabilities. Lack of User Control Consumers often face complex, jargon-filled privacy policies that obscure the extent of data collection and use. This complexity minimizes users' ability to make informed decisions about their data. Furthermore, a survey by the Pew Research Center found that 62% of Americans believe it is impossible to go through daily life without companies collecting data about them, reflecting a resignation to the loss of control over personal information. Major IoT Security Challenges IoT security faces several challenges that make networks vulnerable to cyberattacks. Many security systems fail to detect connected IoT devices or track their communication, making them easy targets. Weak Authentication and Authorization Many IoT devices depend on default passwords and lack strong authentication measures. It makes them easy targets for hackers. Tip: Use multi-factor authentication (MFA) and role-based access control (RBAC) to strengthen IoT security. Lack of Encryption Unprotected IoT network data transmissions permit sensitive information to become exposed to ransomware attacks and unauthorized data breaches. Tip: E2EE encryption should be applied to secure both internal data movement and stored data across networks. Secure communication channels with TLS/SSL protocols and deploy VPNs for remote device access. Insecure Communication Protocols and Channels IoT devices often share networks with other devices that allow cybercriminals to misuse weak communication channels. Tip: The transmission of data needs secure protocols consisting of HTTPS, MQTTS, and TLS. The combination of firewalls with network segmentation should protect IoT networks. These operate separately from critical infrastructure to stop potential attackers from spreading further within the network. Difficulty in Patching and Updates The basic design of IoT devices rejects regular software patches, allowing long-term security threats to develop. Security measures built into devices are essential for organizations to achieve a protected device infrastructure. Tip: An effective solution to protect devices involves the implementation of IoT device management tools that enable remote updating of systems. The organization needs a lifecycle management plan that automates the replacement process of devices that cannot receive security updates. Best Practices for Securing IoT Devices Ensuring IoT device security requires proactive measures to protect your data and network. Here are the key best practices to follow: Keep Software and Firmware Updated Always install the latest updates for your IoT devices. Enable automatic updates or check the manufacturer's website regularly. Change Default and Weak Passwords Default passwords are easy targets for cybercriminals. Set unique, strong passwords (at least 12 characters with letters, numbers, and symbols) for all IoT devices Secure Your Router and Wi-Fi Network Rename your router to avoid revealing its make/model and enable WPA2 or WPA3 encryption for enhanced security. Review Privacy and Security Settings Adjust default privacy settings to limit data exposure and disable unnecessary device features like Bluetooth or NFC to minimize attack vectors. Enable Multi-Factor Authentication (MFA) Where possible, activate MFA to add a layer of protection. It ensures access requires more than just a password. Avoid Public Wi-Fi Risks When managing IoT devices on the go, use a VPN to prevent cyber threats over unsecured public networks. The Role of Governments and Industry Standards The General Data Protection Regulation (GDPR), created by the EU in 2018, established worldwide data security guidelines. GDPR focuses on transparency and user control regarding personal information and requires responsible data management. It affects all organizations, both inside and outside the EU zone, that make their products or services available to customers from the EU. Under GDPR personal data must be protected and privacy rights must be respected. It thus creates implementation difficulties for IoT systems because of its 'privacy by design and by default' specification. The data handling practice landscape has been transformed by GDPR, which requires organizations to avoid heavy fines. The United States currently operates with limited federal data protection regulations since its main defense remains through state-specific laws that include the California Consumer Privacy Act (CCPA). U.S. national privacy laws exist as fragmented pieces which leads to extensive compliance barriers thus prompting conversations about passing one federal privacy standard like GDPR. Organizations are stepping up to protect the IoT ecosystem. Big names like Google, Microsoft, and Cisco are investing in cybersecurity solutions to boost encryption, lock down their networks, and mitigate cyberattacks. The Future of IoT Security and Privacy The Internet of Things brings forth an enormous transformation in how human beings operate with technology. The system delivers many advantages which benefit both efficiency and convenience. To ensure device security, manufacturers need to implement robust encryption solutions with updated software and transparent information for their products. Users need information about the privacy dangers that IoT devices create. Also, users must understand the privacy policy review and implement secure cybersecurity measures to decrease privacy risks. All IoT devices demand synchronized support from manufacturers alongside users, policymakers, and technological companies to achieve harmony. Privacy and security establish themselves as essential core aspects that need to be incorporated into every IoT solution design.

By Dennis Helfer

Prioritizing Cloud Security Risks: A Developer's Guide to Tackling Security Debt

In this era of ever-growing digital footprint, decreasing security debt has become so critical for organizations operating in the cloud. The myriads of unresolved security findings expose services vulnerable to emerging threats as well as pose risk to compliance and governance. The solution requires organizations to develop an efficient method for prioritizing security risks based on severity levels across different teams to tackle this problem at scale. A forward-thinking solution involves creating a centralized security graph that merges various risk and compliance signals into one unified view. Such platforms enable engineering teams and security teams to discover and manage their most critical security risks by assessing their real business impact and risk severity rather than their age or backlog size. Why Security Debt Is a Silent Threat Security debt remains invisible until an organization faces a critical situation. Security debt consists of recognized security vulnerabilities together with unaddressed weaknesses and unresolved actions that have not received timely attention. The accumulation of these security issues elevates organizational exposure to breaches and regulatory penalties while damaging customer trust. Traditional teams resolve their oldest issues first while also clearing items based on project milestones. The approach fails to identify the most critical risks among others. Risk-based prioritization needs to replace age-based decision making because it focuses on potential impact and likelihood of exploitation. Introducing a Risk-Driven Prioritization Model Modern cloud security teams integrated security debt data with security graphs comprising of data related to KPI, Security debt, service attribution, inventory details with unified risk prioritization frameworks to solve this problem. These tools enable service teams to monitor and respond to security risks through one unified interface "single pane of glass." The recommended method unifies security control KPIs (Key Performance Indicators) with active risk registers and compliance programs for better engineering telemetry. The system assigns risk scores to each KPI based on severity, likelihood, compliance impact and other key attributes. These scores then help teams understand: What risks they should address immediately. How security risk is distributed across services. Where to invest engineering effort to get the most impact. How It Works KPI Prioritization Engine is the key part of this framework that helps to prioritize work based on risk associated with any exposed gaps. Each KPI represents a specific security control or vulnerability category and is mapped to a risk profile that includes its category (incident, vulnerability, or weakness), severity, age, impact and likelihood. These attributes are weighted and scored to generate a KPI Score which then generates both Service-level and organization-level risk metrics. For example, a service with critical and unresolved vulnerabilities will score higher (i.e., riskier) than one with only minor compliance weaknesses. Prioritization within each tier ensures that high-risk issues rise to the top of the action list providing engineering teams with clear guidance on where to focus and resolve high impact security debt. From KPI to Organizational Health The importance of this framework is that it provides multi-tier risk prioritization ranging from KPI, Service to organizational health. By aggregating KPI scores, organizations can achieve: Service Risk Scores – representing the total security debt for a single cloud service. Out of SLA Risk Scores – focusing specifically on overdue actions for a cloud service. Security Compliance Scores – showing how teams are performing against compliance, governance and audit criteria. Org-Level Scores – offering a view of the top 25% riskiest services within a group or organization. With this level of insight, security and engineering leaders can not only track progress but also allocate resources more strategically and defend security investments with data. Why This Matters Shifting from age-based prioritization to a risk-based approach allows organizations to more effectively reduce their top security risks. It supports a culture of continuous improvement by aligning engineering priorities with actual potential threats and compliance urgency. Moreover, it offers transparency and accountability by showing which services are carrying the most risk and why. Key Takeaways for Reducing Security Debt Use a Centralized graph – Unify KPI tracking, risk registers, and compliance programs in one view. Prioritize by Risk parameters, Not Age – Focus first on issues with the highest potential impact or exploitation likelihood. Drive Consistency with Risk Scoring – Apply a standard framework across teams to reduce subjective prioritization. Track Security Posture at Every Level – From individual services to entire organizations, use score-based insights to drive accountability. Continuously Update and Improve – Keep refining prioritization algorithms as new KPIs and risk types emerge. What’s Next? Organizations should look to expand this framework to include broader service health metrics, such as performance and reliability signals, alongside security. The goal should be to develop a Comprehensive Service Health Score that reflects both the resilience and security of cloud services, enabling teams to proactively manage the health of their systems end-to-end. Security debt reduction is no longer just about fixing bugs or checking boxes. Its about making informed decisions to protect services, inculcate trust and focus on long term resiliency strategy. With the right prioritization framework and mindset, reducing security debt is easily achievable.

By Sam Prakash Bheri

Securing the Future: Best Practices for Privacy and Data Governance in LLMOps

Over the last few years, they have rapidly developed in the field of large language models (LLMs) since these models can now underpin anything, from a customer service chatbot to an enterprise-grade solution. Now that such models are more woven into the fabric of daily operations, the definition of importance will extend beyond privacy to strong data governance. The operational infrastructure around LLMs is changing rapidly, focusing on security, compliance, and data protection as their rapid adoption across sectors makes such things poignant. 2025 is the watershed moment for organizations to protect their AI programs from the flapping winds of change brought on by regulations and technology in shaping their already tenuous frameworks of data governance and privacy. This article discusses the best practices for privacy and data governance in LLMOps, which allow organizations to guard sensitive data while enabling innovations in AI. The State of LLMOps in 2025 Presently, LLMOps' operations consist of the deployment, management, and optimization of large-language models. This area of growing maturity is now encompassing other areas such as security and privacy, compliance and risk, and data governance. Reports suggest that LLMOps can not only expedite the time-to-market for AI solutions but also build up model reliability and manage regulatory risks. In 2025, AI governance will have grown in importance, and the increasing maturity of LLMOps will ensure that organizations deploy and maintain LLMs in a secure, compliant, and ethical manner. Data governance and data privacy are paramount in AI deployment; organizations need to ensure that risk factors with LLM deployments are mitigated. LLMs are trained with enormous datasets, many of which are sensitive. Data leakage and the unintended disclosure of personal information become nagging challenges, especially in healthcare, finance, and legal services. As privacy concerns go up and compliance becomes stringent, organizations must ensure that their AI operations are 100% aligned with global privacy regulations. Privacy and Data Governance Challenges Data Privacy and Leakage Massive language models are developed to deal with massive data volumes, mainly from diverse sources, thus establishing the necessity for strong data protection mechanisms. Leakage of sensitive information, whether through accidental disclosure of private information by authority or through adversarial manipulation, greatly compromises privacy. For instance, LLM-enabled customer service bots may unintentionally disclose sensitive customer information while responding to prompts, thus risking consumer trust and contravening data privacy laws. The intricate problem of compliance, on the other hand, represents a constant burden for the enterprise. Organizations are therefore supposed to plot their way through this complicated web of legal obligations that impose strict data governance obligations with respect to data privacy laws such as the GDPR, CCPA, and specific laws like HIPAA in the healthcare industry. Disasters followed Argentina, the Mount Everest of penalties, and reputational damage. Adversarial Threats Another major threat to privacy and data integrity is adversarial attacks. Attackers could exploit the weaknesses within LLMs to perform an array of actions, including, among others, prompt injections to disable security filters to extract proprietary data or even manipulate the output of the model with regard to specific intentions. This shows the need for very strong security controls in the LLMOps infrastructure to guard against such adversarial attacks. Model and Supply Chain Security Another emerging risk within LLMOps is the model and supply chain security issue. As organizations increasingly rely on third-party APIs or open-source models, the potential for unauthorized access, data exfiltration, or security breaches also grows. Supply chain attacks can compromise the integrity of LLMs, leading to data leakage or the introduction of malicious code into otherwise secure environments. Research Insights and Case Studies The implementation of advanced privacy frameworks as well as data governance frameworks in LLMOps is already an effective outcome. For instance, OneShield Privacy Guard deployment is such a case study. The tool scored an F1 of about 95% in the detection of sensitive entities in 26 languages. Thus, it outperformed other privacy solutions by 12% and decreased over 300 hours of manual privacy review within three months. It also shows the possibility of automated privacy frameworks to enforce sensitive data security and yet improve operational efficiency at the enterprise LLMOps level. Both an actual deployment and the automated guardrail raised privacy indicators on 8.25% out of 1,256 pull requests, proving that contextually aware privacy structures can detect and mitigate privacy violations in an LLM environment. Best Practices for Privacy and Data Governance in LLMOps In light of the above challenges, organizations must adopt comprehensive strategies for data governance, security controls, and privacy-preserving techniques in their LLMOps processes. Data Governance and Management 1. Comprehensive Governance Frameworks Organizations need to develop data governance frameworks that will set policies for access to data, encryption, and anonymization. The effectiveness of these frameworks lies in ensuring compliance with data handling and privacy laws and industry standards. 2. Regular Audits and Reviews There is a need to regularly audit the data pipelines and model outputs for privacy risks. Though most issues can be picked up in an automated way, occasionally the person has to review the case for records when sensitive data is embedded in unstructured forms such as text or images. In addition, one can always minimize the data and pseudonymize it to limit the risk of disclosing personally identifiable information (PII). 3. Third-Party Risk Management In instances where third-party APIs and open-source models constitute LLMOps, managing and appraising risk factors with respect to third parties becomes extremely important. Organizations should implement stringent access control mechanisms and conduct regular security audits to prevent supply chain vulnerabilities. Security Controls 1. Access Controls and Authentication Enforcing strong access controls is one of the most effective ways to protect LLMs and sensitive data. Role-based access control (RBAC), multi-factor authentication (MFA), and proper API key management are essential components of LLMOps security. 2. Data Encryption All data, both at rest and in transit, should be encrypted using strong standards such as AES (Advanced Encryption Standard). Secure key management practices are essential to ensure that encryption remains robust and accessible only to authorized users. 3. AI Firewalls and Prompt Security Filters Implementing AI firewalls and prompt security filters can help mitigate adversarial threats by blocking malicious inputs designed to exploit vulnerabilities in LLMs. These security layers provide an additional safeguard to prevent prompt injections and other forms of adversarial manipulation. Privacy-Preserving Techniques 1. Differential Privacy Differential privacy is an advanced technique that adds noise to the data or model outputs, making it impossible to identify individuals within the dataset. This ensures that LLMs can be trained on sensitive data without compromising privacy. 2. Federated Learning In federated learning, the model is trained locally on user devices, ensuring that raw data never leaves the device. This approach reduces the risk of data exposure while still enabling organizations to build powerful LLMs with decentralized data. 3. Context-Aware Entity Recognition Tools For real-time detection and redaction of sensitive information, adopting context-aware entity recognition tools within LLMOps can help identify and protect sensitive data in both inputs and outputs. Compliance and Monitoring 1. Regulatory Alignment To ensure compliance with laws like GDPR, CCPA, and HIPAA, LLMOps processes must be aligned with these frameworks. This includes maintaining detailed audit logs of model outputs and data access, along with compliance documentation. 2. Incident Monitoring and Response Using automated SIEM systems to continuously monitor for vulnerabilities and data breaches is critical. Organizations should also have incident response plans in place, enabling rapid action in the event of a security breach or privacy violation. Organizational Practices 1. Responsible AI Committee Establishing a Responsible AI Committee can help oversee privacy, security, and compliance throughout the LLM lifecycle. This cross-functional team should include representatives from legal, security, and data governance to ensure comprehensive oversight of LLMOps processes. 2. Ongoing Security Training Continuous security training for developers and operational teams is crucial for fostering a culture of privacy-first AI. Regular workshops and awareness campaigns ensure that all stakeholders understand the risks and best practices for securing LLM operations Emerging Trends in LLMOps Security and Privacy 1. Zero-Trust AI Security Models One rising trend is the application of the zero-trust security model within the LLMOps. This tenet assumes that no entity, regardless of whether inside or outside an organization, is to be trusted by default. An organization can guarantee tamper resistance and better data traceability through the use of AI red-teaming, self-healing systems, and blockchain data provenance. 2. Automated Privacy Guardrails Tools like OneShield Privacy Guard are setting new standards for scalable, context-aware privacy protection, allowing organizations to automatically flag privacy risks without the need for constant human oversight. Conclusion While organizations are adopting LLMOps in order to deploy and manage large language models, organizations must prioritize privacy and data governance as part of the compliance, security, and trust regime. Businesses "could" protect sensitive data while maintaining ethical AI standards by adopting privacy-preserving techniques, implementing strong security controls, and ensuring regulatory compliance. As the state of the art advances and grows, these best practices will evolve to become the bedrock for securing the future of LLMOps and their further scaling into AI capabilities while preserving the privacy and trust of users, in line with the accelerated development speeds.

By Rajarshi Tarafdar

Can You Run a MariaDB Cluster on a $150 Kubernetes Lab? I Gave It a Shot

If you're like me, learning how to run databases inside Kubernetes sounds better when it's hands-on, physical, and brutally honest. So instead of spinning up cloud VMs or using Kind or minikube on a laptop, I went small and real: four Orange Pi 3 LTS boards (a Raspberry Pi alternative), each with just 2GB RAM. My goal? Get MariaDB — and eventually Galera replication — running on Kubernetes using the official MariaDB Kubernetes Operator. TL;DR: If you came here for the code, you can find Ansible playbooks on this GitHub repository, along with instructions on how to use them. For production environments, see this manifest. Disclaimer: This isn’t a tutorial on building an Orange Pi cluster, or even setting up K3s. It’s a record of what I tried, what worked, what broke, and what I learned when deploying MariaDB on Kubernetes. This article ignores best practices and security in favor of simplicity and brevity of code. The setup presented here helps you to get started with the MariaDB Kubernetes Operator so you can continue your exploration with the links provided at the end of the article. Info: The MariaDB Kubernetes Operator has been in development since 2022 and is steadily growing in popularity. It’s also Red Hat OpenShift Certified and available as part of MariaDB Enterprise. Galera is a synchronous multi-primary cluster solution that enables high availability and data consistency across MariaDB nodes. Stripping K3s Down to the Essentials First of all, I installed K3s (a certified Kubernetes distribution built for IoT and edge computing) on the control node as follows (ssh into the control node): curl -sfL https://get.k3s.io | \ Shell INSTALL_K3S_EXEC="--disable traefik \ --disable servicelb \ --disable cloud-controller \ --disable network-policy" \ sh -s - server --cluster-init These flags strip out components I didn't need: traefik: No need for HTTP ingress.servicelb: I relied on NodePorts instead.cloud-controller: Irrelevant on bare-metal.network-policy: Avoided for simplicity and memory savings. On worker nodes, I installed K3s and joined the cluster with the usual command (replace <control-node-ip> with the actual IP of the control node): Shell curl -sfL https://get.k3s.io | \ K3S_URL=https://<control-node-ip>:6443 \ K3S_TOKEN=<token> sh - To be able to manage the cluster from my laptop (MacOS), I did this: Shell scp orangepi@<master-ip>:/etc/rancher/k3s/k3s.yaml ~/.kube/config sed -i -e 's/127.0.0.1/<control-node-ip>/g' ~/.kube/config Windows users can do the same using WinSCP or WSL + scp. And don’t forget to replace <control-node-ip> with the actual IP again. Installing the MariaDB Operator Here’s how I installed the MariaDB Kubernetes operator via Helm (ssh into the control node): Shell helm repo add mariadb-operator https://helm.mariadb.com/mariadb-operator helm install mariadb-operator-crds mariadb-operator/mariadb-operator-crds helm install mariadb-operator mariadb-operator/mariadb-operator It deployed cleanly with no extra config, and the ARM64 support worked out of the box. Once installed, the operator started watching for MariaDB resources. The MariaDB Secret I tried to configure the MariaDB root password in the same manifest file (for demo purposes), but it failed, especially with Galera. I guess the MariaDB servers are initialized before the secret, which makes the startup process fail. So, I just followed the documentation (as one should always do!) and created the secret via command line: Shell kubectl create secret generic mariadb-root-password --from-literal=password=demo123 I also got the opportunity to speak with Martin Montes (Sr. Software Engineer at MariaDB plc and main developer of the MariaDB Kubernetes Operator). He shared this with me: “If the rootPasswordSecretKeyRef field is not set, a random one is provisioned by the operator. Then, the init jobs are triggered with that secret, which ties the database's initial state to that random secret. To start over with an explicit secret, you can delete the MariaDB resource, delete the PVCs (which contain the internal state), and create a manifest that contains both the MariaDB and the Secret. It should work.” You can find some examples of predictable password handling here. Minimal MariaDB Instance: The Tuning Game My first deployment failed immediately: OOMKilled. The MariaDB Kubernetes Operator is made for real production environments, and it works out of the box on clusters with enough compute capacity. However, in my case, with only 2GB per node, memory tuning was unavoidable. Fortunately, one of the strengths of the MariaDB Kubernetes Operator is its flexible configuration. So, I limited memory usage, dropped buffer pool size, reduced connection limits, and tweaked probe configs to prevent premature restarts. Here’s the config that ran reliably: YAML # MariaDB instance apiVersion: k8s.mariadb.com/v1alpha1 kind: MariaDB metadata: name: mariadb-demo spec: rootPasswordSecretKeyRef: # Reference to a secret containing root password for security name: mariadb-root-password key: password storage: size: 100Mi # Small storage size to conserve resources on limited-capacity SD cards storageClassName: local-path # Local storage class for simplicity and performance resources: requests: memory: 512Mi # Minimum memory allocation - suitable for IoT/edge devices like Raspberry Pi, Orange Pi, and others limits: memory: 512Mi # Hard limit prevents MariaDB from consuming too much memory on constrained devices myCnf: | [mariadb] # Listen on all interfaces to allow external connections bind-address=0.0.0.0 # Disable binary logging to reduce disk I/O and storage requirements skip-log-bin # Set to ~70% of available RAM to balance performance and memory usage innodb_buffer_pool_size=358M # Limit connections to avoid memory exhaustion on constrained hardware max_connections=20 startupProbe: failureThreshold: 40 # 40 * 15s = 10 minutes grace periodSeconds: 15 # check every 15 seconds timeoutSeconds: 10 # each check can take up to 10s livenessProbe: failureThreshold: 10 # 10 * 60s = 10 minutes of failing allowed periodSeconds: 60 # check every 60 seconds timeoutSeconds: 10 # each check can take 10s readinessProbe: failureThreshold: 10 # 10 * 30s = 5 minutes tolerance periodSeconds: 30 # check every 30 seconds timeoutSeconds: 5 # fast readiness check --- # NodePort service apiVersion: v1 kind: Service metadata: name: mariadb-demo-external spec: type: NodePort # Makes the database accessible from outside the cluster selector: app.kubernetes.io/name: mariadb # Targets the MariaDB pods created by operator ports: - protocol: TCP port: 3306 # Standard MariaDB port targetPort: 3306 # Port inside the container nodePort: 30001 # External access port on all nodes (limited to 30000-32767 range) The operator generated the underlying StatefulSet and other resources automatically. I checked logs and resources — it created valid objects, respected the custom config, and successfully managed lifecycle events. That level of automation saved time and reduced YAML noise. Info: Set the innodb_buffer_pool_size variable to around 70% of the total memory. Warning: Normally, it is recommended to not set CPU limits. This can make the whole initialization process and the database itself slow (and cause CPU throttling). The trade-off of not setting limits is that it might steal CPU cycles from other workloads running on the same Node. Galera Cluster: A Bit of Patience Required Deploying a 3-node MariaDB Galera cluster wasn’t that difficult after the experience gained from the single-instance deployment — it only required additional configuration and minimal adjustments. The process takes some time to complete, though. So be patient if you are trying this on small SBCs with limited resources like the Orange Pi or Raspberry Pi. SST (State Snapshot Transfer) processes are a bit resource-heavy, and early on, the startup probe would trigger restarts before nodes could sync on these small SBCs already running Kubernetes. I increased probe thresholds and stopped trying to watch the rollout step-by-step, instead letting the cluster come up at its own pace. And it just works! By the way, this step-by-step rollout is designed to avoid downtime: rolling the replicas one at a time, waiting for each of them to sync, proceeding with the primary, and switching over to an up-to-date replica. Also, for this setup, I increased the memory a bit to let Galera do its thing. Here’s the deployment manifest file that worked smoothly: YAML # 3-node multi-master MariaDB cluster apiVersion: k8s.mariadb.com/v1alpha1 kind: MariaDB metadata: name: mariadb-galera spec: replicas: 3 # Minimum number for a fault-tolerant Galera cluster (balanced for resource constraints) replicasAllowEvenNumber: true # Allows cluster to continue if a node fails, even with even number of nodes rootPasswordSecretKeyRef: name: mariadb-root-password # References the password secret created with kubectl key: password generate: false # Use existing secret instead of generating one storage: size: 100Mi # Small storage size to accommodate limited SD card capacity on Raspberry Pi, Orange Pi, and others storageClassName: local-path resources: requests: memory: 1Gi # Higher than single instance to accommodate Galera overhead limits: memory: 1Gi # Strict limit prevents OOM issues on resource-constrained nodes galera: enabled: true # Activates multi-master synchronous replication sst: mariabackup # State transfer method that's more efficient for limited bandwidth connections primary: podIndex: 0 # First pod bootstraps the cluster providerOptions: gcache.size: '64M' # Reduced write-set cache for memory-constrained environment gcache.page_size: '64M' # Matching page size improves memory efficiency myCnf: | [mariadb] # Listen on all interfaces for cluster communication bind-address=0.0.0.0 # Required for Galera replication to work correctly binlog_format=ROW # ~70% of available memory for database caching innodb_buffer_pool_size=700M # Severely limited to prevent memory exhaustion across replicas max_connections=12 affinity: antiAffinityEnabled: true # Ensures pods run on different nodes for true high availability startupProbe: failureThreshold: 40# 40 * 15s = 10 minutes grace periodSeconds: 15 # check every 15 seconds timeoutSeconds: 10 # each check can take up to 10s livenessProbe: failureThreshold: 10 # 10 * 60s = 10 minutes of failing allowed periodSeconds: 60 # check every 60 seconds timeoutSeconds: 10 # each check can take 10s readinessProbe: failureThreshold: 10 # 10 * 30s = 5 minutes tolerance periodSeconds: 30 # check every 30 seconds timeoutSeconds: 5 # fast readiness check --- # External access service apiVersion: v1 kind: Service metadata: name: mariadb-galera-external spec: type: NodePort # Makes the database accessible from outside the cluster selector: app.kubernetes.io/name: mariadb # Targets all MariaDB pods for load balancing ports: - protocol: TCP port: 3306 # Standard MariaDB port targetPort: 3306 # Port inside the container nodePort: 30001 # External access port on all cluster nodes (using any node IP) After tuning the values, all three pods reached Running. I confirmed replication was active, and each pod landed on a different node — kubectl get pods -o wide confirmed even distribution. Info: To ensure that every MariaDB pod gets scheduled on a different Node, set spec.gallera.affinity.antiAffinityEnabled to true. Did Replication Work? Here’s the basic test I used to check if replication worked: Shell kubectl exec -it mariadb-galera-0 -- mariadb -uroot -pdemo123 -e " CREATE DATABASE test; CREATE TABLE test.t (id INT PRIMARY KEY AUTO_INCREMENT, msg TEXT); INSERT INTO test.t(msg) VALUES ('It works!');" kubectl exec -it mariadb-galera-1 -- mariadb -uroot -pdemo123 -e "SELECT * FROM test.t;" kubectl exec -it mariadb-galera-2 -- mariadb -uroot -pdemo123 -e "SELECT * FROM test.t;" The inserted row appeared on all three nodes. I didn’t measure write latency or SST transfer duration—this wasn’t a performance test. For me, it was just enough to confirm functional replication and declare success. Since I exposed the service using a simple NodePort, I was also able to connect to the MariaDB cluster using the following: Shell mariadb -h <master-ip> --port 30001 -u root -pdemo123 I skipped Ingress entirely to keep memory usage and YAML code minimal. What I Learned The MariaDB Operator handled resource creation pretty well — PVCs, StatefulSets, Secrets, and lifecycle probes were all applied correctly with no manual intervention.Galera on SBCs is actually possible. SST needs patience, and tuning memory limits is critical, but it works!Out-of-the-box kube probes often don’t work on slow hardware. Startup times will trip checks unless you adjust thresholds.Node scheduling worked out fine on its own. K3s distributed the pods evenly.Failures teach more than success. Early OOM errors helped me understand the behavior of stateful apps in Kubernetes much more than a smooth rollout would’ve. Final Thoughts This wasn’t about benchmarks, and it wasn’t for production. For production environments, see this manifest. This article was about shrinking a MariaDB Kubernetes deployment to get it working on a constrained environment. It was also about getting started with the MariaDB Kubernetes Operator and learning what it does for you. The operator simplified a lot of what would otherwise be painful on K8s: it created stable StatefulSets, managed volumes and config, and coordinated cluster state without needing glue scripts or sidecars. Still, it required experimentation on this resource-limited cluster. Probes need care. And obviously, you won’t get resilience or high throughput from an SBC cluster like this, especially if you have a curious dog or cat around your cluster! But this is a worthwhile test for learning and experimentation. Also, if you don’t want to fiddle with SBCs, try Kind or minikube. By the way, the MariaDB Kubernetes Operator can do much more for you. Check this repository to see a list of the possibilities. Here are just a few worth exploring next: Multiple HA modes: Galera Cluster or MariaDB Replication.Advanced HA with MaxScale: a sophisticated database proxy, router, and load balancer for MariaDB.Flexible storage configuration. Volume expansion.Take, restore and schedule backups.Cluster-aware rolling update: roll out replica Pods one by one, wait for each of them to become ready, and then proceed with the primary Pod, using ReplicasFirstPrimaryLast.Issue, configure and rotate TLS certificates and CAs.Orchestrate and schedule sql scripts.Prometheus metrics via mysqld-exporter and maxscale-exporter.

By Alejandro Duarte

CORE

Orchestrating Microservices with Dapr: A Unified Approach

Introduction Modern software architectures are increasingly embracing microservices to improve scalability, flexibility, and resilience. However, as the number of systems expands, managing inter-service communication, data persistence, event-driven messaging, and security becomes more complex. Additionally, as a product scales, organizations often inadvertently develop strong dependencies on specific database providers, messaging middleware, or cloud vendors. This tight coupling makes future changes challenging, often requiring extensive refactoring. Dapr (Distributed Application Runtime) offers a unified abstraction for handling these concerns, allowing microservices to interact with databases, message queues, APIs, and secrets stores in a cloud-agnostic and infrastructure-independent manner. Figure 1: How Dapr works This article explores how Dapr simplifies microservices orchestration, using an Order Management System (OMS) as an example. We'll demonstrate: Database access for state managementEvent-driven messaging for data processing across servicesService-to-service invocation for inter-service communicationSecure secrets management for handling credentials Figure 2: How Dapr simplifies microservices orchestration Managing State Without Tight Coupling One of the fundamental needs in microservices is persistent storage. Instead of using a database SDK tied to a specific provider, Dapr provides a state management API that works across multiple databases such as PostgreSQL, DynamoDB, and Redis. Configuration To enable database access, we configure Dapr to use AWS DynamoDB by creating a component file as seen below: YAML apiVersion: dapr.io/v1alpha1 kind: Component metadata: name: orderstatestore namespace: default spec: type: state.aws.dynamodb version: v1 metadata: - name: region value: us-east-1 - name: table value: OrdersTable - name: partitionKey value: orderId This configuration tells Dapr to use DynamoDB as the storage backend. Saving and Retrieving Data via Dapr API Instead of integrating directly with AWS SDKs, our order service interacts with the database via Dapr’s state API: Java import io.dapr.client.DaprClient; import io.dapr.client.DaprClientBuilder; import org.springframework.stereotype.Service; @Service public class OrderService { private static final String STATE_STORE_NAME = "orderstatestore"; private final DaprClient daprClient; public OrderService() { this.daprClient = new DaprClientBuilder().build(); } public void createOrder(Order order) { //Blocking (Synchronous) Approach daprClient.saveState(STATE_STORE_NAME, order.getOrderId(), order).block(); } public Order getOrder(String orderId) { return daprClient.getState(STATE_STORE_NAME, orderId, Order.class).block().getValue(); } } Using Dapr’s state API, the underlying database is abstracted, enabling seamless migration. This eliminates the need for AWS-specific configurations within the application code, allowing developers to switch databases without modifying the business logic. Pub/Sub Messaging: Event-Driven Data Processing Many microservices follow event-driven architectures where services communicate via message brokers. Instead of integrating directly with Kafka, RabbitMQ or AWS SNS/SQS, Dapr provides a generic pub/sub API. Configuring To enable event-driven messaging, we configure Dapr to use AWS SNS as seen below: YAML apiVersion: dapr.io/v1alpha1 kind: Component metadata: name: orderspubsub namespace: default spec: type: pubsub.aws.sns version: v1 metadata: - name: region value: us-east-1 - name: topic value: orderCreatedTopic Publishing Events Once an order is created, we publish an event to AWS SNS without directly using AWS SDKs. This enables downstream applications to trigger subsequent processes, such as shipping and billing. Java import io.dapr.client.DaprClient; import io.dapr.client.DaprClientBuilder; import org.springframework.stereotype.Service; @Service public class OrderEventPublisher { private final DaprClient daprClient; public OrderEventPublisher() { this.daprClient = new DaprClientBuilder().build(); } public void publishOrderCreatedEvent(Order order) { //Publish as Fan-out message for point to point use invokeMethod daprClient.publishEvent("orderspubsub", "orderCreatedTopic", order).block(); } } Subscribing Events Create a Dapr subscription file (order-subscription.yaml) so that the service can listen for the order-created event: YAML apiVersion: dapr.io/v1alpha1 kind: Subscription metadata: name: order-subscription spec: pubsubname: orderspubsub topic: orderCreatedTopic route: /orders scopes: - payment-service The payment service listens for order events: Java import org.springframework.web.bind.annotation.*; @RestController public class PaymentEventListener { @Topic(name = "orderCreatedTopic", pubsubName = "orderspubsub") @PostMapping("/orders") public void handleOrderEvent(@RequestBody Order order) { System.out.println("Processing payment for Order ID: " + order.getOrderId()); // Implement further processing (e.g., triggering shipping) } } This decouples order and payment services, allowing them to scale independently. Service Invocation Instead of using hardcoded URLs like traditional REST APIs, Dapr allows microservices to communicate dynamically. The payment service retrieves order details via Dapr without knowing its exact hostname/IP: Java import io.dapr.client.DaprClient; import io.dapr.client.DaprClientBuilder; import org.springframework.stereotype.Service; @Service public class PaymentService { private final DaprClient daprClient; public PaymentService() { this.daprClient = new DaprClientBuilder().build(); } public Order getOrderDetails(String orderId) { return daprClient.invokeMethod("orderservice", "orders/" + orderId, null, Order.class).block(); } } Services do not need to handle discovery or manage hardcoded addresses, as Dapr automatically takes care of networking. Secrets Management Instead of storing credentials in environment variables or application properties, Dapr provides a secrets management API, enabling secure retrieval of secrets from providers like AWS Secrets Manager or HashiCorp Vault. Configuring Below is how to configure this using Dapr: YAML apiVersion: dapr.io/v1alpha1 kind: Component metadata: name: aws-secrets namespace: default spec: type: secretstores.aws.secretsmanager version: v1 metadata: - name: region value: us-east-1 Retrieving Secrets The order service securely retrieves credentials via Dapr’s secret store API: Java import io.dapr.client.DaprClient; import io.dapr.client.DaprClientBuilder; import org.springframework.stereotype.Service; import java.util.Map; @Service public class SecretService { private final DaprClient daprClient; public SecretService() { this.daprClient = new DaprClientBuilder().build(); } public Map<String, String> getDatabaseSecrets() { return daprClient.getSecret("aws-secrets", "dbPassword").block(); } } This ensures credentials are securely stored and accessed only when needed. Conclusion Dapr streamlines microservice orchestration with a unified, cloud-agnostic abstraction for database access, messaging, service invocation, and secrets management. It supports polyglot architectures, enabling seamless interaction across different programming languages without infrastructure dependencies. By integrating database and messaging components, developers can build scalable, maintainable systems without vendor lock-in. With built-in features like circuit breakers, retries, and observability, Dapr enhances resilience, reduces complexity, and allows services to evolve independently. By abstracting infrastructure concerns, it enables teams to focus on business logic, accelerating development and supporting scalable, distributed systems across any cloud or hybrid environment.

By Vigneshwaran Manivelmurugan

Driving DevOps With Smart, Scalable Testing

DevOps practices can require software to be released fast, sometimes with multiple deployments throughout the day. This is critical to DevOps, and to accomplish it, developers must test in minutes to determine if software will move forward, be sent back to the drawing board or canned altogether. Identifying and correcting bugs prior to production is essential to the Software Development Life Cycle (SDLC) and testing should play a part in all processes. During the test phase, integrating automated testing when possible is critical, with the choice of approach tailored to the specific application’s structure. This could involve focusing on public methods for APIs, verifying code and components or implementing comprehensive end-to-end (E2E) assessments. Emphasizing a thorough testing process ensures all aspects, such as units or methods, and integration between internal system components and frontend and backend parts. Further, structured test management systems help provide comprehensive reporting and clear communication about outcomes and development progress. This keeps the entire team informed and aligned with the application’s ongoing status. Yet, no matter the deadline or tool used, an organization must be hyperfocused on quality. With this in mind, testing should no longer be exclusive to Quality Assurance (QA) Teams: Engineers should participate and be held accountable as well. Shared responsibility delivers consistently reliable results, weeding out issues before they take root and waste resources. Quality is a responsibility of the whole R&D organization, working closely with business and product teams. Automation speeds up cycles and mitigates human error, and automated testing identifies defects in code or integration between different system components. A good example of this is a software bug introduced by a code change, discovered by unit tests during the automated testing phase. That, or it could be caused by a configuration change, resulting in a missing HTTP header that broke the integration tests. The Shape of Testing Each step of the SDLC requires individual forms of testing. This includes: Unit tests for individual components/units of workIntegration tests for vetting relations between components of a binary package or between the frontend and backend of an applicationDatabase tests to evaluate accuracy and reliability of database systemsE2E tests to verify the whole system is built according to business requirements and user journeys, which reduces backtracking in the event an error has been identified Build failure notifications and log processing speed directly depend on the specific CI/CD pipeline toolchain implemented in your development environment. Also important are the frameworks you have integrated and the quality of the error handling mechanism, which should be able to identify errors in single minutes. The particular testing layers are aimed at addressing an app’s functionality, performance and reliability. The Test Pyramid provides developers with a framework that can effectively guide processes and shape their testing. Unit tests focus on a single method or component. They’re quick, affordable and a good first-pass for ensuring code quality. They are written using the same language as the code they test, usually stored close by, and maintenance of these is the same as the application code, including all SDLC procedures. It’s important these tests are done in the build stage and prior to any code deployment to avoid unnecessary steps in case it is broken. If not, the tests will flag the build as a failure and prevent the next steps in the pipeline. Next are the Integration and API tests. These are used to ensure the components and different layers of the application are working as expected and communication is happening between them using expected scenarios. In this type of test we can use different frameworks and languages: It’s not necessary to use the same language in which your application is written. It’s also important to understand that to run these you must be able to deploy the code first as many require usage of publicly available methods. Then there are user interface (UI) E2E tests, the most encompassing of all, analyzing system integrations across the frontend, backend, databases and networking. These are usually created by QA who work with all lines of business and individual product owners. These tests are the costliest, consuming the most time and maintenance, particularly as business needs grow. Traditional testing approaches often rely on the standard Test Pyramid (as seen above), which prioritizes unit tests at the base, followed by integration tests, with a small apex of E2E tests. However, an atypical or "inverted" Test Pyramid emerges when teams overemphasize E2E testing. This antipattern creates a "leaking ice cream cone" architecture where: E2E tests dominate the testing strategyLower-level tests become sparseMaintenance complexity escalates exponentiallyResource consumption increases disproportionately The result is a testing approach that's fragile against business requirement changes, computationally expensive, and strategically misaligned with efficient software development principles. Testing apps manually isn’t easy and consumes a lot of time and money. Testing complex ones with frequent releases requires an enormous number of human hours when attempted manually. This will affect the release cycle, results will take longer to appear, and if shown to be a failure, you’ll need to conduct another round of testing. What’s more, the chances of doing it correctly, repeatedly and without any human error, are highly unlikely. Those factors have driven the development of automation throughout all phases of the testing process, ranging from infrastructure builds to actual testing of code and applications As for who should write which tests, as a general rule of thumb, it’s a task best-suited to software engineers. They should create unit and integration tests as well as UI e2e tests. QA analysts should also be tasked with writing UI E2E tests scenarios together with individual product owners. QA teams collaborating with business owners enhance product quality by aligning testing scenarios with real-world user experiences and business objectives. The test discovery phase, typically conducted manually, establishes the foundation for UI end-to-end tests, which are often implemented using Gherkin language. Gherkin, a structured syntax for behavior-driven development (BDD), follows a specific format: Given (initial context or preconditions)When (action or event occurs)Then (expected outcome or result) This structure allows testers to define clear, readable scenarios that bridge the gap between business requirements and technical implementation. Gherkin's “Given-When-Then” format facilitates effective communication between stakeholders and developers, ensuring test cases accurately reflect desired behaviors and user stories gathered during the discovery phase. Many testing frameworks support this format, and it has proven very efficient and easily convertible into executable steps, making it a go-to tool for test design. A Case for Testing Testing can be carried out successfully with readily available tools and services. Amazon Web Services (AWS), one of the most widely used today, is a good case in point. AWS CodePipeline can provide completely managed continuous delivery that creates pipelines, orchestrates and updates infrastructure and apps. It also works well with other crucial AWS DevOps services, while integrating with third-party action providers like Jenkins and Github. As a result, AWS CodePipeline can provide many vital capabilities and functionality, alongside scalability and cost efficiency. Here are advantages you can expect with AWS Codepipeline: Enables automated software release workflowsSeamless connection with AWS and third-party servicesEasy configuration and real-time status trackingAdapts to complex deployment requirementsIntegrated with AWS Identity and Access Management (IAM)Pay only for actual pipeline actions executed AWS CodePipeline offers a detection option that can kick off a pipeline centered on the source location of the artifacts. This is particularly useful for tasks such as function descriptions and risk assessments. When it comes to leveraging those stored artifacts, AWS encourages using Github webhooks along with Amazon CloudWatch Events. The tool also has a “disable transition” feature that connects pipeline stages and can be used as a default. To keep from automatically advancing, you simply need to hit a button and activities cease. AWS CodePipeline allows pipeline edits for starting, updating or completely removing stages as well. An edit page lets users add actions as a series or alongside ongoing activities. This brings added flexibility to a pipeline and can better scale growth. When it comes to management, an approval action feature offers firm oversight of stages. For instance, if someone tasked with approval has not weighed in, the pipeline will close down until they do so. Finally, AWS CodeBuild and CodePipeline work together seamlessly to create a powerful continuous integration and continuous delivery (CI/CD) pipeline. You can have multiple CodeBuild actions within a single CodePipeline stage or across different stages. This allows for parallel builds, different build environments or separate build and test actions. The following is a brief example of the orchestration, using some demo code that stimulates the deployment of the application and includes the test phase. This is more like a pseudo code with guidelines rather than the completed solution, but it could be easily converted into a running pipeline with some changes and adaptations. For the simplicity of the example, I will use AWS ElasticBeanstalk as a service to host the application. So let's describe the architecture of our solution: Source: GitHub repository (we are going to pull the application)Build: AWS CodeBuild (we are going to build and run the unit test) Uploading Artifacts (the artifacts would be stored on S3)Deploy: AWS Elastic Beanstalk (our hosting service on AWS)Testing: AWS CodeBuild for E2E tests (executing these tests against a newly deployed version of our application) *I used a text-to-diagram tool called Eraser for this sketch. Very easy prompting and you can manually edit results. Assuming we store our code in a version control system such as GitHub, the action (a code commit) on the code repository will trigger the source stage of the AWS CodePipeline, and the latest code will be pulled. During the build phase, compile application code and execute unit tests to validate component functionality and ensure code quality. Then you can advance to the deploy stage with AWS Elastic Beanstalk as a deploy provider, after which you can move on to the testing stage and run the E2E test suite against the newly deployed version of the application. The testing stage involves approving the deployment and reports test results. In case of a failure, we then have to take care of the rollback, using the AWS ElasticBeanstalk configuration. In order to manually provision a pipeline and the components you need to complete the orchestration, there are some steps you need to follow: The Codepipeline setup: Inside your AWS console, type the Codepipeline and open the serviceClick "Create pipeline"Choose "Build custom pipeline" and click "Next"Add the name of your pipelineSelect "V2" for pipeline typeChoose the service role (new or existing)Click "Next" Next is to configure the Source Stage: Select Github V2 as the source provider (assuming you use Github)Click "Connect to Github"Select repository and the branchClick "Next" Now here’s the Build Stage: Use "AWS CodeBuild" as your build providerSelect RegionClick "Create project," name it and configure the build environment, add buildspec configuration fileAdd input artifactAdd output artifactClick "Next" Deploy Stage: Select "AWS Elastic Beanstalk" as your deploy providerSelect RegionProvide application name, environment name and artifactEnable automatic rollback under "Advanced"Click "Next" E2E Tests Stage: Click "Add Stage"Add nameClick "Add actions group"Add action nameSelect Action provider "AWS CodeBuild"Input artifactSelect CodeBuildProject or create a new oneConfigure the test environment and buildspec Now you should review and create your pipeline. In order to complete it, you would need to finish the build project in AW CodeBuild. To configure your Elastic Beanstalk environment, make sure you have all health checks and monitoring set up. Next up are the configuration files for builds and AWS Codepipeline. Consider these as examples because your specific workload could use different frameworks and languages. This could make deployed strategies and destination services could vary. JSON Codepipeline-config.yaml: { "pipeline": { "name": "web-app-pipeline", "roleArn": "arn:aws:iam::account:role/service-role/pipeline-role", "artifactStore": { "type": "S3", "location": "my-pipeline-artifact-bucket" }, "stages": [ { "name": "Source", "actions": [ { "name": "Source", "actionTypeId": { "category": "Source", "owner": "AWS", "provider": "CodeStarSourceConnection", "version": "1" }, "configuration": { "ConnectionArn": "arn:aws:codestar-connections:region:account:connection/xxx", "FullRepositoryId": "owner/repo", "BranchName": "main" }, "outputArtifacts": [ { "name": "SourceCode" } ] } ] }, { "name": "Build", "actions": [ { "name": "BuildAndTest", "actionTypeId": { "category": "Build", "owner": "AWS", "provider": "CodeBuild", "version": "1" }, "configuration": { "ProjectName": "web-app-build" }, "inputArtifacts": [ { "name": "SourceCode" } ], "outputArtifacts": [ { "name": "BuildOutput" } ] } ] }, { "name": "Deploy", "actions": [ { "name": "Deploy", "actionTypeId": { "category": "Deploy", "owner": "AWS", "provider": "ElasticBeanstalk", "version": "1" }, "configuration": { "ApplicationName": "web-application", "EnvironmentName": "web-app-prod" }, "inputArtifacts": [ { "name": "BuildOutput" } ], "onFailure": { "result": "ROLLBACK" }, } ] }, { "name": "E2ETest", "actions": [ { "name": "E2ETests", "actionTypeId": { "category": "Test", "owner": "AWS", "provider": "CodeBuild", "version": "1" }, "configuration": { "ProjectName": "web-app-e2e" }, "inputArtifacts": [ { "name": "SourceCode" } ] } ] } ] } } # buildspec.yml for main build # a snippet of the phases for the build stage # code versions and commands are for example only version: 0.2 phases: install: runtime-versions: nodejs: 18 pre_build: commands: - npm install build: commands: - npm run build - npm run test post_build: commands: - echo Build completed artifacts: files: - '**/*' base-directory: 'dist' # buildspec-e2e.yml for E2E tests # a snippet for the e2e tests # code versions and commands are for example only version: 0.2 phases: install: runtime-versions: nodejs: 18 pre_build: commands: - npm install build: commands: - npm run e2e-tests In addition, you have to create IAM service roles for CodeBuild and Codepipeline. These should look like the following: Codepipeline service role policy example: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject*", "s3:PutObject", "s3:GetBucketVersioning" ], "Resource": [ "arn:aws:s3:::my-pipeline-artifact-bucket/*", "arn:aws:s3:::my-pipeline-artifact-bucket" ] }, { "Effect": "Allow", "Action": "codestar-connections:UseConnection", "Resource": "${ConnectionArn}" }, { "Effect": "Allow", "Action": [ "codebuild:StartBuild", "codebuild:BatchGetBuilds" ], "Resource": "arn:aws:codebuild:${region}:${account}:project/web-app-*" }, { "Effect": "Allow", "Action": [ "elasticbeanstalk:CreateApplicationVersion", "elasticbeanstalk:DescribeApplicationVersions", "elasticbeanstalk:DescribeEnvironments", "elasticbeanstalk:UpdateEnvironment" ], "Resource": "*" } ] } Codebuild service role policy example: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": "arn:aws:logs:region:account:log-group:/aws/codebuild/*" }, { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:GetObjectVersion", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::my-pipeline-artifact-bucket/*" ] }, { "Effect": "Allow", "Action": [ "ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability", "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage" ], "Resource": "*" } ] } By automating your release process, CodePipeline helps improve your team's productivity, increases the speed of delivery, and enhances the overall quality and reliability of your software releases. This Isn’t Just a Test The blowback that can come when an untested, buggy app makes its way into mainstream use can be severe. And there’s virtually no reason that should happen when proven technology for testing exists. I think we all heard about the plane crash in Ethiopia in 2019, involving a Boeing 737 with people aboard. There was a lot of investigation and some findings pointed to a lack of testing and human factors. This included incomplete testing protocols, insufficient simulation testing, and poorly implemented safety measures and risk mitigation testing. The lack of a proper or comprehensive specification was the factor that led to these incomplete procedures. As a result, lives were lost. Today, in most cases, the failure of implementing proper specs and comprehensive testing will not end up in a tragedy like this, but it will impact the business and potentially cause lots of unwanted expenses or losses. Just focus attention on tools that automate functions and take human error out of the equation. Equally important, be sure everyone involved in development has a role in testing, whether they’re coders, engineers or in QA. Shared responsibility across your organization ensures everyone has a stake in its success. The goal should be to deliver business value and your tools should support this. After all, this isn’t just a test, it isn’t just the toolset, it’s the way to drive DevOps for business success.

By Dima Kramskoy

Manual Sharding in PostgreSQL: A Step-by-Step Implementation Guide

Learn how to implement manual sharding in native PostgreSQL using Foreign Data Wrappers. This tutorial walks through creating distributed tables without additional extensions like Citus. The Challenge With Database Scaling As applications grow, single-node databases face several challenges: Limited storage capacity on a single machineQuery performance degradation with growing datasetsHigher concurrency demands exceeding CPU capabilitiesDifficulty maintaining acceptable latency for global users Sharding — horizontally partitioning data across multiple database nodes — offers a solution to these scaling problems. Why Manual Sharding in PostgreSQL? While solutions like Citus and other distributed database systems exist, there are compelling reasons to implement manual sharding: More control: Customize the sharding logic to your specific application needs.No additional dependencies: Utilize only native PostgreSQL features.Learning opportunity: Gain a deeper understanding of distributed database concepts.Incremental adoption: Apply sharding only to specific high-volume tables.Cloud-agnostic: Implement your solution on any infrastructure. Setting Up Our Sharded Architecture Let's implement a simplified manual sharding approach that works with a single PostgreSQL instance. This makes it easier to test and understand the concept before potentially scaling to multiple instances. Step 1: Create Sharded Tables First, let's create our sharded tables in a single PostgreSQL database: SQL CREATE TABLE users_shard1 ( id BIGINT PRIMARY KEY, name TEXT ); CREATE TABLE users_shard2 ( id BIGINT PRIMARY KEY, name TEXT ); Note that we're using the BIGINT type for IDs to better handle large data volumes. Step 2: Index the Shards for Better Performance Adding indexes improves query performance, especially for our routing functions: SQL CREATE INDEX idx_user_id_shard1 ON users_shard1(id); Step 3: Implement Insert Function With Routing Logic This function routes data to the appropriate shard based on a simple modulo algorithm: SQL CREATE OR REPLACE FUNCTION insert_user(p_id BIGINT, p_name TEXT) RETURNS VOID AS $ BEGIN IF p_id % 2 = 0 THEN INSERT INTO users_shard2 VALUES (p_id, p_name); ELSE INSERT INTO users_shard1 VALUES (p_id, p_name); END IF; END; $ LANGUAGE plpgsql; Step 4: Create Read Function With Routing Logic For reading data, we'll create a function that routes queries to the appropriate shard: SQL CREATE OR REPLACE FUNCTION read_user(p_id BIGINT) RETURNS TABLE(id BIGINT, name TEXT) AS $ BEGIN IF p_id % 2 = 0 THEN RETURN QUERY SELECT u.id::BIGINT, u.name FROM users_shard2 u WHERE u.id = p_id; ELSE RETURN QUERY SELECT u.id::BIGINT, u.name FROM users_shard1 u WHERE u.id = p_id; END IF; END; $ LANGUAGE plpgsql; Notice we use aliasing and explicit casting to handle any potential type mismatches. Step 5: Create a Unified View (Optional) To make queries transparent, create a view that unions the sharded tables: SQL CREATE VIEW users AS SELECT * FROM users_shard1 UNION ALL SELECT * FROM users_shard2; Step 6: Testing Our Sharded System Let's test our system with a few simple inserts: SQL SELECT insert_user(1, 'Alice'); SELECT insert_user(2, 'Bob'); SELECT insert_user(3, 'Carol'); SELECT insert_user(4, 'Dave'); Now, read the data using our routing function: SQL SELECT * FROM read_user(1); SELECT * FROM read_user(2); Or query all data using the unified view: SQL SELECT * FROM users ORDER BY id; Benchmarking Our Sharding Implementation Let's benchmark our implementation to understand the performance characteristics. We'll use Python scripts to test both insertion and read performance. Python Benchmark Script for Inserts (Sharded) Here's the script for benchmarking inserts into our sharded tables (seed_pg_sharded.py): Python import psycopg2 from time import time conn = psycopg2.connect("dbname=postgres user=postgres password=secret host=localhost port=5432") cur = conn.cursor() start = time() for i in range(4, 100_001): cur.execute("SELECT insert_user(%s, %s)", (i, f'user_{i}')) conn.commit() end = time() print("Sharded insert time:", end - start) Python Benchmark Script for Inserts (Single Table) For comparison, we'll also test insertion performance on a single table (seed_pg_single.py): Python import psycopg2 from time import time conn = psycopg2.connect("dbname=postgres user=postgres password=secret host=localhost port=5432") cur = conn.cursor() start = time() for i in range(100_001, 100_001 + 500_000): cur.execute("INSERT INTO users_base VALUES (%s, %s)", (i, f'user_{i}')) conn.commit() end = time() print("Single-node insert time:", end - start) Python Benchmark Script for Reads Finally, we'll compare read performance between the single table and our sharded implementation (read_bench.py): Python import psycopg2 from time import time # Configs conn = psycopg2.connect("dbname=postgres user=postgres password=secret host=localhost port=5432") def time_reads(cur, query, param_fn, label): start = time() for i in range(1000, 2000): # Run 1000 point queries cur.execute(query, (param_fn(i),)) cur.fetchall() end = time() print(f"{label}: {end - start:.3f} sec for 1000 point reads") # Benchmark single table with conn.cursor() as cur: print("Benchmarking Point Reads on Single Table") time_reads(cur, "SELECT * FROM users WHERE id = %s", lambda x: x, "Single Table") # Benchmark sharded read_user function with conn.cursor() as cur: print("\nBenchmarking Point Reads via read_user() Function") time_reads(cur, "SELECT * FROM read_user(%s)", lambda x: x, "Sharded Function") conn.close() Adding Range Read Functionality For more complex queries, we can add a function to read a range of IDs: PLSQL CREATE OR REPLACE FUNCTION read_user_range(start_id BIGINT, end_id BIGINT) RETURNS TABLE(id BIGINT, name TEXT) AS $ BEGIN -- Query from both shards and union the results RETURN QUERY (SELECT u.id::BIGINT, u.name FROM users_shard1 u WHERE u.id BETWEEN start_id AND end_id) UNION ALL (SELECT u.id::BIGINT, u.name FROM users_shard2 u WHERE u.id BETWEEN start_id AND end_id) ORDER BY id; END; This function allows us to read a range of users across both shards in a single query. Performance Observations Based on benchmarking results, we can observe several key patterns with manual sharding: Conclusion Manual sharding in PostgreSQL offers a powerful approach to horizontal scalability without requiring third-party extensions like Citus. Using a combination of function-based routing and separate tables, we can distribute data efficiently while maintaining a unified interface for our application.

By Aditya Karnam Gururaj Rao

Tired of Spring Overhead? Try Dropwizard for Your Next Java Microservice

Instead of a monolith, build your first Java microservice with Dropwizard. Hello, my fellow programmers! I’m positive you do not want to read another complex article on how to build Java microservices. We are going to take a look at Dropwizard today. It is fairly convenient as it has everything loaded in it, i.e., Jetty, Jersey, Jackson, etc., and also provides you with the ability to set your business logic without the boilerplates. The Best Parts About Using Dropwizard for Microservices Let’s be clear on a few things: Dropwizard is not a new framework that was made yesterday. It has already undergone testing. It's an impressive supplement that puts together many libraries. To put it another way: Imagine trying to build a race car. You required welding parts, but some pre-tuned engines were given to you instead. That is exactly what Dropwizard is. What’s good about Dropwizard: Everything you need is built into it: HTTP Server (Jetty), REST (Jersey) , JSON (Jackson), and everything else comes pre-installed and wired up.Your service will run like a fat JAR, with an embedded server, aka zero app server drama.You can deploy it anywhere Java is supported.You don’t have to rely on anyone to get it ready to use because it can come equipped with health checks, metrics, and logging — features that loyal Dropwizard fans have come to love. In my experience, Dropwizard is always the best choice if you are looking for something to ship fast while keeping operations happy. From my experience, Dropwizard is particularly impressive when speed is critical and operational overhead needs to be minimal. There is no yak-shaving with dependency management, a-spring with XML configurations. Prerequisite and Project Set up Let's first verify something. Having Java 11+ (or the Dropwizard version you’re targeting) and Maven is a prerequisite. After that, you can do the following: My preferred way is to utilize the Maven Archetype. I believe this is the best way to scaffold a project because it does not require any manual arrangement of directories. Java mvn archetype:generate \ -Dfilter=io.dropwizard.archetypes:java-simple Manual Maven project (for those who want to learn the insides): Create a standard Maven JAR project (eg, com.example:hello-world:1.0-SNAPSHOT). Add the following to your pom.xml: Java <dependency> <groupId>io.dropwizard</groupId> <artifactId>dropwizard-core</artifactId> <version>4.0.2</version> </dependency> That single dependency pulls in Jetty, Jersey, Jackson, Metrics, Logback, and more for you. Uber JAR Packaging: Make sure you’re building a “fat JAR” so everything’s bundled. The archetype sets up Maven Shade for you, but if you’re doing it manually, add the Maven Shade Plugin under <build> in your POM to merge dependencies into target/your-app.jar. Once everything’s fine with Maven, we can take a look at the configuration. Configuration: Your Configuration Class and YAML Setting up Dropwizard applications is easier with YAML, which is preferred over XML. There are no nested structures here! Additionally, the program assumes config.yml exists with a set of things like ports, log levels, database URL, and additional required configurations (Dropwizard Configuration Reference). Below is a simple illustration: Java server: applicationConnectors: - type: http port: 8080 adminConnectors: - type: http port: 8081 logging: level: INFO Dropwizard integrates the aforementioned YAML within a Java class that has been defined previously, extending io.dropwizard.Configuration. Even if custom configurations are not needed at the beginning, a stub would always be required: Java public class HelloWorldConfiguration extends Configuration { // Add @JsonProperty fields here as needed... } As you append details like databaseUrl and template, you will also add annotation fields with @JsonProperty (and validation facets like @NotEmpty) so that Dropwizard can transform the YAML into a POJO class automatically (Dropwizard Configuration Reference). The Main Event: Other Application and Resource Classes That’s alright, it’s time to put everything together and make things functional. The Application Class Your entry point extends: Application<T>: Java public class HelloWorldApplication extends Application<HelloWorldConfiguration> { public static void main(String[] args) throws Exception { new HelloWorldApplication().run(args); } @Override public void initialize(Bootstrap<HelloWorldConfiguration> bootstrap) { // Add bundles or commands here if needed } @Override public void run(HelloWorldConfiguration config, Environment env) { env.jersey().register(new HelloWorldResource()); } } main initiates Dropwizard.initialize is where all of the bundles (once again, think of bundles as databases, migrations) get wired in.runprovides you with Environment, allowing you to register additional resources. These can include: Jersey components (env.jersey().register(...))Health check (env.healthChecks().register(...))Tasks, servlets, filters, etc Example Resource This is our “hello” endpoint. Java @Path("/hello") @Produces(MediaType.APPLICATION_JSON) public class HelloWorldResource { @GET public Map<String, String> sayHello() { return Collections.singletonMap("message", "Hello, World!"); } } @Path specifies what the endpoint path will be.@Produces is how Dropwizard (using Jackson) instructs it to render JSON.In this case, your method returns a Map. It is safe to assume that Jackson will take care of it TalentConnect. If you want to customize the response even further, you could return a custom POJO that is stronger in type, such as HelloMessage. Building, Running, and Testing Now it's time to see everything in action. Java mvn clean package This produces a JAR-like target/hello-world-1.0-SNAPSHOT.jar.Config file: Create config.yml in your project root (or resources). You could leave it as an empty{} as well, but personally, I prefer to specify ports because it makes everything predictable. Run the service: Java java -jar target/hello-world-1.0-SNAPSHOT.jar server config.yml You will notice logs about Jetty starting on app port 8080 and admin port 8081. Stack Overflow JavaDoc. Test with cURL or browser: Java curl http://localhost:8080/hello # {"message":"Hello, World!"} For admin end points: You can go to http://localhost:8081/healthcheck and http://localhost:8081/metrics to view some ready-built health checks (including a deadlocks check) and metric data (Dropwizard Core). If you need some automated testing, Dropwizard does ship with a dropwizard-testing module, so you could spin up your app in a random port and make HTTP calls to it with real requests made against it — super useful for integration tests (Testing Dropwizard). Step Up: Health Checks, Databases, Metrics, and More Basic functions are taken care of, but with real applications, there's more to do. Database Connection Dropwizard has modules for Hibernate, JDBI, and JDBC. For example, Load: Java <dependency> <groupId>io.dropwizard</groupId> <artifactId>dropwizard-jdbi3</artifactId> <version>${dropwizard.version}</version> </dependency> Then, set your data source in YAML and bootstrap a DBI in either initialize or run. It even wires a health check to ping your database automatically (dropwizard/dropwizard-health — GitHub). Custom Health Checks Want to make your own self-tests? Extend com.codahale.metrics.health.HealthCheck: Java public class DatabaseHealthCheck extends HealthCheck { private final Database db; //... @Override protected Result check() throws Exception { return db.ping() ? Result.healthy() : Result.unhealthy("DB unreachable"); } } Register where you need to within run(): env.healthChecks().register("database", new DatabaseHealthCheck(db)); Viewing your custom checks, plus all defaults, is done through your admin port’s /healthcheck endpoint (Health Checks — Dropwizard Metrics). Metrics and Monitoring Measure, don't just log. With a MetricRegistry, count events, time methods, and track histograms. The admin /metrics endpoint serves JSON metrics that can be effortlessly piped into Prometheus or Graphite (Getting Started — Dropwizard). Bundles and Extensions Looking for templating, static assets, or authentication? Check out bundles for: Freemarker/Mustache templatingOAuth and JWT authRedisLiquibaseHibernate With Dropwizard Bundles, plugins that hook into initialize, scattering your code with boilerplate, is avoided. Security and HTTPS Basic Authentication and OAuth can be configured with io.dropwizard.auth modules. Also, confirm HTTPS connections with applicationConnectors restrictions in YAML. It is as simple as the phrase ‘Plug-and-Play.’ From Hello World to Production-Ready: Now That’s a Wrap So, what knowledge do we have now? Dropwizard glues together Jetty, Jersey, Jackson, Metrics, Logback, and more into a lean, coherent stack (Home — Dropwizard).YAML config + a Configuration subclass keeps setup clean and consistent (Dropwizard Configuration Reference).Your Application and resource classes handle wiring, while admin endpoints give you health and metrics out-of-the-box (Getting Started — Dropwizard, Dropwizard Core).It’s production-ready by default, yet flexible enough to grow with bundles, extensions, and custom tweaks. Now that you’ve got a working Dropwizard service, I would recommend that you do the following: Persist data: With the ability to put a real CRUD implementation, bore down on Hibernate or JDBI.Custom checks: Set checks related to the health of services your API heavily relies on.Instrument: Track key focus spots regarding performance with the surface the set of @Timed, @Metered, and @Counted.Secure: Depending on preference, enable basic auth, OAuth, and JWT for security. Deploy: Use your preferred cloud for deployment, Dockerize your Fat Jar, and run it on Kubernetes.

By Mohit Menghnani

Secrets Sprawl and AI: Why Your Non-Human Identities Need Attention Before You Deploy That LLM

It seems every company today is excited about AI. Whether they are rolling out GitHub Copilot to help teams write boilerplate code in seconds or creating internal chatbots to answer support tickets faster than ever, large language models (LLMs) have driven us into a new frontier of productivity very rapidly. Advancements like retrieval-augmented generation (RAG) have let teams plug LLMs into internal knowledge bases, making them context-aware and therefore much more helpful to the end user. However, if you haven’t gotten your secrets under control, especially those tied to your growing fleet of non-human identities (NHIs), AI might speed up your security incident rate, not just your team's output. Before you deploy a new LLM or connect Jira, Confluence, or your internal API docs to your internal chat-based agent, let’s talk about the real risk hiding in plain sight: secrets sprawl and the world of ungoverned non-human identities. Non-Human Identities And The Secrets They Hold NHIs are everywhere in modern DevOps and cloud-native environments. Also known as machine identities, these are digital references used for machine-to-machine access. They can take a lot of different forms, such as service accounts, API keys for CI/CD pipelines, containers running microservices, or even AI agents accessing vector databases or calling APIs. They exist to move data, run tasks, and interact with other systems. Each NHI requires credentials, or secrets, of some form to authenticate and gain the needed access to perform the needed work. Unlike people who can use multifactor authentication methods of FIDO-based passwordless approaches to ensure they are really the correct user of a system, NHIs mostly rely solely on a secret itself to connect. Those secrets tend to sprawl across repos, cloud environments, collaboration tools, and knowledge bases. GitGuardian’s 2025 State of Secrets Sprawl report revealed that over 23.7 million secrets were leaked in public GitHub repos. That is not cumulative; that was the number added in just the year 2024. The report also showed that more than 58% of secrets were generic, meaning they did not map to a specific known service or platform. These 'generic' secrets are most commonly used by internal services and homegrown NHIs. Compounding the issue of secrets sprawl, NHIs are rarely tied to a single human user. Unlike employees or end users, there often is no offboarding plan for these NHIs. For a lot of systems, that also means their secrets keep on living, essentially, forever. Since NHI access levels must be set up front for most systems, there is also a tendency to widely scope the rights of these identities to allow them to do a range of things, instead of following the principle of least privilege to limit the scope to just barely what is needed. No organization wants any secrets to leak, especially those tied to NHIs, but this is exactly what can happen in a hasty LLM deployment. When RAG Retrieves A Secret Early AI models were very limited in what they could actually do, bound only to what topics or specific data sets they were trained on. Retrieval-augmented generation (RAG) removes this limitation by allowing the LLM to go get additional data as needed when prompted. Many companies are rushing to make their internal data sources available to agent-based AI tools. Ideally, this would just expose the needed knowledge and nothing else. However, this is where things can go wrong. For example, let's walk through an example RAG implementation: An internal user asks the AI assistant chatbot, “How do I connect to our internal dev environment?”The LLM checks Confluence or Jira for relevant documents.It finds an old page with a still valid hardcoded password — "root:myp@ssword123!"The LLM includes that page in its context and says: “You can connect using the following credentials…” That is less than ideal, even if the user is a developer who is hurriedly trying to get their project deployed. It is even worse if the user was an unauthorized attacker trying to steal whatever they can find after breaching your perimeter. The core of the issue is that our data source documents weren’t built with AI or secrets in mind. Unlike with code and developer workflows, there are no safeguards in place to prevent someone from adding API keys, login instructions with passwords, or even full-blown database connection strings. This effectively turns your chatbot into a very friendly and helpful internal secrets-leaking engine. Given that NHIs outnumber humans at least 45 to 1, it is highly likely that any secret leaked in this way belongs to a non-human identity. Maybe no one ever rotated it. Maybe knows it is even there. Now it’s surfaced by your AI, logged, and exposed. Logging and Feedback Loops Exposing Secrets Adding to the risks from RAG finding secrets in source documents, AI engineers and machine learning teams can just as easily leak NHI credentials while trying to build observability into these systems. Since we cannot see what is going on inside the models at runtime, we need to log everything from the initial prompt, the retrieved context, and the generated response to tune the system. If a secret is exposed in any one of those logged steps in the process, now you’ve got multiple copies of the same leaked secret. While this would be worrying enough if your logs remained internal to your organization, most dev teams rely on third-party logging tools, meaning your secrets are no longer just in your servers. Unfortunately, in many organizations, engineers store logs in cloud buckets or local machines that are not governed by the usual security controls. Anywhere along the logging pipeline where they might be intercepted or read by an attacker is now a potential spot where a secret could be compromised. And if you’re using a third-party LLM (like OpenAI), you may have zero visibility into where those logs go. Before You Deploy That Next LLM, Get Ahead of the Sprawl If you're deploying AI today, or planning to soon, there are a few key things you can do right now to get ahead of the risk: Scrub sources before you connect: Scan and clean every knowledge base you plan to use with RAG. Confluence, Jira, Slack, internal wikis. Treat them like code; secrets don’t belong there.Inventory your NHIs: Build a list of your non-human identities, including service accounts, bots, agents, and pipelines. Track what secrets they use and who owns them.Vault everything: Move secrets out of code and into secrets managers. Use tools like HashiCorp Vault, CyberArk, or AWS Secrets Manager. Make sure rotation is enforced.Monitor and sanitize AI logs: Treat AI system logs as sensitive infrastructure. Monitor them. Sanitize them. Audit them regularly.Use role-based access to RAG: Restrict what documents can be retrieved based on user roles and document sensitivity. Just because it’s in your knowledge base doesn’t mean the chatbot should share it with anyone who asks. The Future Of AI Is The Future Of Machine-to-Machine Communication The adoption of AI brings some amazing promises. RAG is making it so much more powerful. But in this new landscape, machines are talking to machines more than ever. And those machines, your NHIs, are now accessing and potentially exposing your data and introducing new operational risks. Don’t just secure your secrets, though that is undoubtedly part of the solution. The time has come to govern your non-human identities. Track them. Map them. Understand how they interact with your AI stack. Because the real secret to secure AI isn’t just smarter models — it’s smarter identity management.

By Dwayne McDaniel

Cloud Security and Privacy: Best Practices to Mitigate the Risks

Cloud security refers to technologies, best practices, and safety guidelines that help to protect your data from human errors, insider and security threats. Therefore, it naturally covers a wide range of procedures, which are aimed at securing systems from data breaches, data loss, unauthorized access, and other cybersecurity-related risks that are growing from year to year. According to GitProtect's State of DevOps Threats report, the number of incidents in GitHub grew by over 20%, and around 32% of events in GitLab had an impact on service performance and customers. Moreover, it’s worth mentioning that the cost of failures is growing as well. Thus, the average cost of recovering from a ransomware attack is around $2.73 million, the average cost of a data breach is $4.88 million, and every minute of downtime can cost up to $ 9 K. To prepare itself for any threats and risks in the cloud, and learn to mitigate them, companies should first understand what cloud security and privacy measures are. Cloud Services and Environments — Let’s Break Down the Types Before we jump to cloud security, we need to understand the basics, because cloud components are usually secured from two main viewpoints: cloud service types and cloud environments. Let’s start with cloud services that providers use as modules for creating cloud environments. Well, cloud services can be provided in a range of ways, each with its own security concerns specific to distinct areas of IT infrastructure and application administration. Thus, you can find: Infrastructure as a service (IaaS)Infrastructure as a service (IaaS)Software as a service (SaaS)This kind of service will provide you with virtualized computing resources like virtual machines, servers, and storage over the Internet. Its focus is to offer and manage basic infrastructure components, while users are responsible for configuring and taking care of network security controls, access management, and data encryption.This model will give you a platform for developing, deploying, and maintaining your applications or other software projects without the need to handle any of the underlying infrastructure. This should prove cost-effective and give you the scalability you need. In this type of service, security focuses on securing the platform and applications, including secure coding, vulnerability assessments, and built-in security measures such as web application firewalls (WAFs).It can provide software applications over the Internet, which removes the requirement for local installation. This kind of service will equip you with on-demand computing resources and is usually operating as a “pay-as-you-go” model. In terms of security, the focus is on protecting the application along with its data. Security practices often include data encryption, access controls, authentication, backups, and disaster recovery. Then, we have different cloud environments or, so to speak, deployment models. Why is it important to understand the difference between them? Like cloud service types, the cloud environments help get a better understanding of responsibilities between service providers and their customers: Public clouds, which are run on a shared infrastructure supplied by third-party cloud service providers. This brings security concerns as the resources are shared, so you will need to implement strong access controls, encryption, and constant monitoring to protect your data and applications.Private clouds are specialized environments for a specific enterprise that provide increased protection and data management. They successfully mitigate internal and external threats by implementing strict access controls, network segmentation, and encryption.Hybrid clouds, which make use of both public and private cloud environments. This way, you get smooth data and application mobility while still maintaining flexibility and security. For sensitive applications, this architecture could use on-premises infrastructure and rely on the public cloud for better scalability and cost savings. Some of the considerations in terms of security in a hybrid cloud include: enforcing consistent security rules across environments, encrypting data in transit and at rest, and maintaining a reliable network connection.Multi-clouds, which require using services from different cloud providers to prevent things like vendor lock-ins and to benefit from the best possible solutions out there. However, taking care of complex security measures and guaranteeing interoperability across cloud platforms (different cloud services and solutions that work together seamlessly) could be challenging. To stay protected across several cloud environments, successful multi-cloud security methods require some sort of centralized security administration, robust authentication systems, and regular audits. Benefits of Cloud Security Strong cloud data security and privacy measures have several benefits that help businesses protect their data and maximize operational effectiveness. Among them, we can mention: Lower costs as you don’t need to pay for dedicated hardwareImproved reliability and availability as cloud services should ensure the accessibility of its services (to achieve this, they should have constant security monitoring)Application security, as cloud providers regularly perform security testing and other secure development practices to minimize the risk of vulnerabilities in their own infrastructureCustomer support to help the users deal with issues 24/7Access control and identity management to help organizations authenticate only authorized usersCompliance with industry standards such as ISO 27001, SOC, GDPR, HIPAA, and others relevant to the industry – it’s the service provider’s obligation to undergo strict security audits and certifications to assure that their service is secureUpdates and innovations, as service providers constantly develop their products to make them better and more secure for their users Why Cloud Security Is Important — Let’s Face the Challenges Cloud security is rather important to maintain customer trust, prevent cybersecurity-related issues from affecting your business, and to stay compliant with regulatory industry standards. The Shared Responsibility Models Moreover, we should clearly understand that cloud service providers operate under the shared responsibility model, which defines the roles and responsibilities of both parties, the provider and its customers. To make a long story short, a cloud service provider is responsible for its service availability and security, and a customer is responsible for their account data. Thus, if you accidentally deleted your data or your data is corrupted, a service provider isn’t responsible for restoring your data. Your account data is your responsibility! And if you think that nothing fails in the cloud, think again. There are documented outages, human error cases, cyberattacks, etc., which are potential threats to your business. That is why it is important to understand what your obligations are in terms of data protection and how to build your data protection strategy in the cloud. Keep Up With Compliance Regulations To be compliant in terms of cloud security means to follow the legal guidelines, data privacy regulations, and overall data protection standards. This especially applies to companies in highly regulated industries, like healthcare, energy, finance, etc. To become compliant with straightened security protocols, organizations should carefully evaluate cloud service providers – preferably, those cloud providers should be compliant with security regulations, like SOC 2, GDPR, ISO 27001, etc. Best Practices for Cloud Security and Privacy Let’s move on to the most important aspects of cloud security. So, how can you strengthen your cyber defenses and take some of the stress off your shoulders? 1. Stay Up to Date With Patching Outdated systems, security processes, or configurations can be exploited by hackers and put your organization at risk of data loss. Therefore, it is critical to stay up to date with the most recent and relevant security updates and upgrade your cloud infrastructure or systems accordingly. 2. Assess the Risks It is important to thoroughly analyze any risks and vulnerabilities concerning your cloud data. By having a clear outline of these threats, your organization can prioritize them properly and deal with them effectively in a timely manner. 3. Encrypt Your Data A key factor in keeping your cloud data protected is encryption. It should be applied at both levels — at rest and in transit. It will help to ensure that even if data is stolen, it is unreadable without the decryption key. 4. Have Constant Monitoring and Auditing You should constantly monitor your network. Keep track of all the devices that are interconnected, if anyone tries to gain unauthorized access, or if any attempts to alter data are made. You can do this manually or use monitoring software solutions. You should set up alerts to notify you of unauthorized access and any new devices connecting to your network. Monitoring helps you detect potential threats earlier and deal with them, leading to better data security. 5. Implement a Zero-Trust Model For maximum security, you should adopt a zero-trust model. That means zero trust for individuals inside and outside your organization. This way, you stay protected from malicious insiders within your organization, old employees who were fired, as well as hackers. Main practices would include strong access controls, authentication mechanisms, and sticking to the least privilege principle. 6. Manage Access Controls Another key element of strong data protection is having clearly defined and effective access controls. Lay out what kind of access your team members will need in order to complete their tasks, and then limit everyone’s access according to their job. 7. Adopt Secure Passwords and MFA Multiple factor authentication (MFA) mechanisms are key because a password and the authentication of the user on the other side of the screen are one of your first lines of defense. If passwords throughout your organization are simple 8-character phrases, it is too weak; a hacker can break this kind of password in 37 seconds. Therefore, you should educate your staff about having strong passwords, implement MFA mechanisms, and apply policies for the kinds of passwords to use (length, numbers, special signs, etc.). 8. Use Antivirus Software and Firewalls An antivirus is software that identifies and gets rid of any malware on a device, and a firewall is a mechanism that stops any unauthorized access to and from your system or network. These help to guard against cyber threats like malware, ransomware, or hackers trying to access your data in general. Firewalls monitor incoming and outgoing network traffic through predefined security rules and have the ability to block or allow data packets. 9. Educate Your Team In order for safety procedures to be effective, your team must clearly understand them. Make sure that everyone knows their roles and responsibilities, what potential risks are out there that could affect your organization, and encourage employees to report any potential suspicious security threats. 10. Make Backup Copies of Your Data A backup and DR solution will help ensure that you never worry about losing your data. When you search for a backup option, make sure that it adheres to the 3-2-1 backup rule, encrypt your data both at rest and in transit (it’s nice if you can use your own encryption key!). cover all of the data — both repositories and metadata, allow you to schedule and automate backups as well as give you the ability to perform granular restores, point-in-time restores, and incremental backups. For compliance or archiving purposes, unlimited retention may come in handy too. All these features will help you to recover your data in no time in case of a disaster scenario, such as accidental or intentional deletion of important data, ransomware, as well as platform outages (you can just access your backups, switch to another platform, and continue working from there). Make Sure That Your Cloud Provider’s DC Is Safe When you choose a cloud provider, it’s important to make sure that it stores your data in a secure data center. Make sure that your cloud services data center has guaranteed physical security, undergoes regular audits, and has fire protection and technical support in place. These data centers are also should be compliant with industry security standards such as ISO 27001, EN 1047-2 standard, SOC 2 Type 2, EN 50600, SOC 3, FISMA, DCID, DOD, HIPAA, ISO 50001, PCI-DSS Level 1 and PCI DSS, LEED Gold Certificate, and SSAE 16. All these measures are important if you decide to go through auditing to become compliant. Have Compliance Checks Regular compliance checks and auditing are important to make sure that your organization keeps adhering to the security standards and regulations. By doing so, you boost the security of your company’s data and support business continuity. Auditing is also important for transparency and compliance with security standards like HIPAA, GDPR, ISO, or SOC. Takeaway To sum up, cloud security is important to stay protected against threats of human error, outages, and cybersecurity threats. Moreover, adhering to the shared responsibility model under which most VCS platforms operate is also important for the security of your data. We had a look at how you can benefit from backup and disaster recovery solutions in terms of the shared responsibility model and outlined the duties of each party in cloud data protection. The SaaS provider is responsible for the uptime and security of their own infrastructure, but your data is your own responsibility. Cloud services such as GitHub and GitLab have no obligation to help you restore your data if it gets deleted, stolen, or corrupted. That is why it is important to have appropriate security measures, such as backup and DR strategies, in your cyber defenses to stay compliant with the shared responsibility model and keep your data safe.

By Milosz Jesis

Data Engineering

Functions of Data Engineering

AI/ML

Big Data

Data

Databases

IoT

DZone's Featured Data Engineering Resources

The Latest Data Engineering Topics