In recent years, it has become common for generative AI-based software development tools to be experimented with, adopted,
and used in professional software development. To understand the effects of this generative AI transformation in the
Finnish context, we conducted a survey targeting software professionals working in Finland. 163 survey responses were
collected during two months, from December 2025 to February 2026. In this paper, we report a qualitative analysis of 100
respondents’ perspectives on the future role of generative AI in professional software development, based on the responses
to two optional open-ended questions in the survey. The results offer a practitioner-centred snapshot of how software
professionals currently reason about the effects of generative AI on roles, skills, productivity, software quality, and the
sustainability of expertise in the field.
The rapid, often unstructured adoption of Generative AI (GenAI) in software-intensive organizations creates a
socio-technical challenge. While GenAI can accelerate individual tasks, these gains are often offset by added coordination,
verification, and governance effort across the software development lifecycle (SDLC). This paper presents DevCoach as a
reference architecture built on the Model Context Protocol (MCP) to make GenAI assistance more lifecycle-aware, governable,
and inspectable in organizational settings. DevCoach treats model outputs as proposals rather than authoritative actions,
and couples generation to lifecycle-scoped retrieval, bounded tool use, deterministic validation, and provenance-linked
audit artifacts. Its core mechanism is two Context Refinement Loops (CRLs): CRL 1 stabilizes user intent into an explicit
retrieval contract, and CRL 2 performs bounded retrieval, evidence verification, and provenance construction to produce a
verified context pack. The contribution is therefore architectural and design-oriented rather than a benchmarked
implementation study. In this way, the paper reframes GenAI integration as a software engineering design and governance
problem rather than a prompt-level optimization problem.
Software testing critically depends on test oracles, yet existing test oracles are often incomplete and insufficient for detecting bugs where implementations deviate from their specifications. Meanwhile, despite advances in test oracle
construction, existing techniques typically rely on coarse-grained failure signals or require substantial manual effort, and thus remain inadequate for detecting specification-violation bugs. To address these limitations, we propose
JavaOracle, a specification-driven approach that leverages large language models (LLMs) to reason over specifications and systematically enhance test oracles. Specifically, JavaOracle consists of three stages. First, it leverages LLMs
to analyze specifications and derive additional test oracles that are not covered by existing tests, while integrating a root-cause–guided repair workflow to ensure that the enhanced test cases are syntactically and semantically valid.
Second, it designs a multi-agent debate workflow to distinguish previously unknown specification-violation bugs from assertion failures caused by LLM hallucinations, thereby mitigating the impact of hallucinations. Third, unlike existing
approaches that stop at bug detection, JavaOracle further automates test case minimization and bug report generation, producing submission-ready reports without manual effort.
We evaluate JavaOracle on 3,961 test cases for Java SE API from the OpenJDK. Experimental results on the latest OpenJDK standard library show that JavaOracle substantially outperforms state-of-the-art baselines, including Fuzz4All,
ChatAssert, and Randoop. Cumulatively, JavaOracle discovers 45 previously unknown bugs, which have already been confirmed/fixed by developers, with many persisting since their initial implementation. In an entire pipeline running,
JavaOracle reduces execution failures to an average of 6 reports per run, achieving 88.9% precision. In contrast, baseline approaches required exhaustive manual inspection to identify only 0, 3, and 2 real bugs, respectively. Further
analysis shows that test cases enhanced by JavaOracle achieve high validity, with an execution pass rate of 79.1%, compared to 35.4%, 92.2% (55.5% test cases unchanged), and 77.6% for the baselines. Ablation studies further demonstrate
the effectiveness of JavaOracle components, while the automated pipeline significantly reduces manual analysis effort.