Database Engineering
with Spring Boot
Most backend failures trace back to the database — N+1 queries silently destroying performance, transactions with wrong isolation levels causing data corruption, connection pools exhausted under load. This section teaches database engineering the way senior engineers think about it: from JPA internals to production query optimization.
- JPA & Hibernate Internals
- Entity Mapping — Done Right
- Entity Lifecycle & Persistence Context
- Relationships — OneToMany, ManyToOne, ManyToMany
- Lazy vs Eager Loading
- The N+1 Problem — Detection & Fix
- Transactions & @Transactional
- Isolation Levels & Locking
- JPQL, Criteria API & Native Queries
- Connection Pooling with HikariCP
- Schema Migrations — Flyway & Liquibase
- Query Optimization & Indexing
- When NOT to Use JPA
JPA & Hibernate Internals
JPA (Java Persistence API) is a specification — a set of interfaces. Hibernate is the most popular implementation. Spring Data JPA wraps Hibernate with repository abstractions. Understanding this three-layer stack is essential: when something goes wrong, you need to know which layer to debug.
JPA (Specification)
Defines interfaces: EntityManager, @Entity, @Table, JPQL syntax. A contract — no implementation code.
Hibernate (ORM)
Implements JPA. Handles SQL generation, caching, dirty checking, proxies, and schema generation. The actual engine doing the work.
Spring Data JPA
Wraps Hibernate with JpaRepository, derived query methods, pagination, and auto-transaction management. Reduces boilerplate dramatically.
What Hibernate Does For You — And What It Hides
Hibernate hides SQL behind objects, which speeds development — but it also hides how many SQL statements run, when they run, and what indexes they use. Engineers who treat Hibernate as a black box eventually ship slow applications. Always watch the SQL Hibernate generates.
spring:
jpa:
show-sql: true # prints SQL to stdout
properties:
hibernate:
format_sql: true # pretty-print SQL
use_sql_comments: true # adds HQL source as comment
logging:
level:
org.hibernate.SQL: DEBUG # see all SQL
org.hibernate.type.descriptor.sql: TRACE # see bind parameter values
The Persistence Context Explained
The Persistence Context is Hibernate's first-level cache and change-tracking unit. It lives for the duration of a @Transactional method. Every entity you load becomes "managed" — Hibernate tracks every field change and flushes them to the database at the right time.
Entity Mapping — Done Right
Poor entity mapping is the root cause of many Hibernate problems — wrong fetch types, missing indexes, bad ID generation strategies. Get this right from day one.
@Entity
@Table(
name = "orders",
indexes = {
@Index(name = "idx_orders_customer_id", columnList = "customer_id"),
@Index(name = "idx_orders_status_created", columnList = "status, created_at"),
@Index(name = "idx_orders_created_at", columnList = "created_at DESC")
}
)
@EntityListeners(AuditingEntityListener.class) // Spring Data auditing
public class Order {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE,
generator = "order_seq")
@SequenceGenerator(name = "order_seq", sequenceName = "order_id_seq",
allocationSize = 50) // batch 50 IDs per DB call
private Long id;
// Use enums — stored as strings, not numbers (never change meaning)
@Enumerated(EnumType.STRING)
@Column(nullable = false, length = 20)
private OrderStatus status;
@Column(nullable = false, precision = 12, scale = 2)
private BigDecimal total;
// Many-to-one: always LAZY — never load the whole customer for every order query
@ManyToOne(fetch = FetchType.LAZY, optional = false)
@JoinColumn(name = "customer_id", nullable = false)
private Customer customer;
// One-to-many: always LAZY
@OneToMany(mappedBy = "order", cascade = CascadeType.ALL,
orphanRemoval = true, fetch = FetchType.LAZY)
private List<OrderItem> items = new ArrayList<>();
// Audit fields — populated automatically by Spring Data
@CreatedDate
@Column(nullable = false, updatable = false)
private LocalDateTime createdAt;
@LastModifiedDate
@Column(nullable = false)
private LocalDateTime updatedAt;
// Optimistic locking — prevents lost updates
@Version
private Long version;
// Equals/hashCode based on business key — NEVER on id alone for new entities
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Order)) return false;
Order other = (Order) o;
return id != null && id.equals(other.getId());
}
@Override
public int hashCode() { return getClass().hashCode(); }
}
ID Generation Strategies
SEQUENCE is the best strategy for PostgreSQL. Hibernate pre-fetches IDs in batches (allocationSize), meaning one DB call per 50 inserts. Supports batch inserts efficiently.
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "order_seq")
@SequenceGenerator(name = "order_seq", sequenceName = "order_id_seq",
allocationSize = 50) // fetch 50 IDs at once — 1 DB roundtrip per 50 inserts
private Long id;
IDENTITY uses database auto-increment (AUTO_INCREMENT in MySQL, SERIAL in PostgreSQL). Simple but disables JDBC batch inserts — Hibernate needs to know the ID immediately after each insert.
With IDENTITY, Hibernate cannot batch INSERT statements because it needs the generated ID from each insert before it can continue. If you're doing bulk inserts, use SEQUENCE instead.
UUID is ideal for distributed systems where IDs must be globally unique without database coordination. Downside: larger storage (16 bytes vs 8), random insertion order causes B-tree index fragmentation.
@Id
@GeneratedValue(strategy = GenerationType.UUID) // Hibernate 6+
private UUID id;
// Or for sequential UUIDs (better index performance):
@Id
@UuidGenerator(style = UuidGenerator.Style.TIME) // time-ordered UUID v7
private UUID id;
GenerationType.TABLE uses a dedicated database table for ID generation with a pessimistic lock on every insert. This creates a single-row bottleneck that serializes all inserts across your entire application. It will destroy throughput under any meaningful load.
Entity Lifecycle & Persistence Context
Every JPA entity exists in one of four states. Knowing which state your entity is in prevents the most common Hibernate bugs: unexpected updates, detached entity exceptions, and missing data.
TRANSIENT
New object, not associated with any persistence context. No DB row. Hibernate doesn't know it exists. new Order() is transient.
MANAGED
Associated with an open persistence context. Hibernate tracks all changes. Any field mutation will produce an UPDATE on flush.
DETACHED
Was managed, now the persistence context closed. Has a DB row but changes are no longer tracked. Common after @Transactional method returns.
REMOVED
Scheduled for deletion. Still in the persistence context, will be DELETE-d on flush/commit.
@Service
@Transactional
public class OrderService {
public void demonstrateLifecycle(Long orderId) {
// TRANSIENT — not yet known to Hibernate
Order newOrder = new Order();
newOrder.setStatus(OrderStatus.PENDING);
// TRANSIENT → MANAGED (scheduled for INSERT on flush)
entityManager.persist(newOrder);
// MANAGED — load from DB, entity tracked by persistence context
Order existing = orderRepository.findById(orderId).orElseThrow();
// Still MANAGED — this change is tracked, UPDATE will run on flush
existing.setStatus(OrderStatus.PROCESSING);
// NO save() call needed — dirty checking handles it automatically!
// MANAGED → REMOVED (DELETE on flush)
entityManager.remove(existing);
// DETACHED entity re-attached — Hibernate does a SELECT then UPDATE
Order detached = getDetachedOrderFromSomewhere();
Order reattached = entityManager.merge(detached);
}
// @Transactional ends → flush → SQL executes → context closes → all entities DETACHED
}
// COMMON BUG: modifying a DETACHED entity
@Service
public class BuggyService {
public Order getOrder(Long id) {
return orderRepository.findById(id).orElseThrow();
// Returns DETACHED entity — @Transactional ended when method returned
}
public void updateOrder(Order order) {
order.setStatus(OrderStatus.SHIPPED);
// BUG: order is DETACHED — this change is SILENTLY IGNORED
// You must call orderRepository.save(order) to re-attach and update
}
}
Relationships — OneToMany, ManyToOne, ManyToMany
JPA relationships are powerful but carry hidden costs. Every relationship annotation is a decision about SQL joins, memory usage, and query complexity. Map them wrong and you'll spend months fighting performance issues.
// Parent side — use mappedBy to avoid a join table
@Entity
public class Order {
@OneToMany(
mappedBy = "order", // field name in OrderItem that owns the FK
cascade = CascadeType.ALL, // persist/merge/remove propagates to items
orphanRemoval = true, // delete item when removed from collection
fetch = FetchType.LAZY // ALWAYS LAZY for collections
)
private List<OrderItem> items = new ArrayList<>();
// Helper methods to keep both sides in sync
public void addItem(OrderItem item) {
items.add(item);
item.setOrder(this);
}
public void removeItem(OrderItem item) {
items.remove(item);
item.setOrder(null);
}
}
// Child side — owns the foreign key column
@Entity
public class OrderItem {
@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "order_id", nullable = false)
private Order order;
}
// ManyToOne is EAGER by default — change it immediately
@ManyToOne(fetch = FetchType.LAZY) // ALWAYS override to LAZY
@JoinColumn(name = "customer_id", nullable = false)
private Customer customer;
// WHY: if you load 100 orders with EAGER customer,
// Hibernate loads 100 customers too — even if you never use them.
// LAZY means: Customer proxy created, actual SELECT only when you call order.getCustomer().getName()
// ManyToMany uses a join table automatically
@Entity
public class Product {
@ManyToMany(fetch = FetchType.LAZY)
@JoinTable(
name = "product_tags",
joinColumns = @JoinColumn(name = "product_id"),
inverseJoinColumns = @JoinColumn(name = "tag_id")
)
private Set<Tag> tags = new HashSet<>();
// Use Set not List for ManyToMany — avoids duplicate join table rows
}
// Better approach for ManyToMany with extra columns on the join table:
// Create an explicit junction entity (ProductTag) with its own @Id
// This gives you full control over the join table and its data
// CascadeType.PERSIST — when you persist parent, children are persisted
// CascadeType.MERGE — when you merge parent, children are merged
// CascadeType.REMOVE — when you delete parent, children are deleted
// CascadeType.REFRESH — when you refresh parent, children are refreshed
// CascadeType.DETACH — when parent detaches, children detach
// CascadeType.ALL — all of the above
// SAFE: cascade ALL for true parent-child (Order → OrderItems)
@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
private List<OrderItem> items;
// DANGEROUS: cascade REMOVE for shared entities
// Deleting an Order should NOT delete the Customer!
@ManyToOne(cascade = CascadeType.REMOVE) // WRONG — deletes customer!
private Customer customer;
// RULE: CascadeType.ALL only for private ownership relationships
// Never cascade to entities that can be referenced by other parents
Lazy vs Eager Loading
This is one of the most misunderstood topics in JPA. Getting the defaults wrong causes either LazyInitializationException in production or silent performance disasters where Hibernate loads megabytes of data you never use.
LAZY (default for collections)
A proxy is returned. The actual SQL runs only when you first access the collection. Zero cost if you never access it. Throws LazyInitializationException if accessed outside a transaction.
EAGER (default for @ManyToOne)
Related data is loaded in the same query. Always costs a JOIN, even if you never use the data. Fine for single objects, catastrophic for collections.
The LazyInitializationException
LazyInitializationException: could not initialize proxy — no Session means you tried to access a lazy collection after the Hibernate session (persistence context) was closed. This typically happens when you return an entity from a @Transactional service method and then access its lazy associations in the controller or view layer.
// ── THE PROBLEM ──────────────────────────────────────────────
@Service
public class OrderService {
@Transactional
public Order getOrder(Long id) {
return orderRepository.findById(id).orElseThrow();
// @Transactional ends here — session CLOSES
}
}
@RestController
public class OrderController {
public OrderDto getOrder(Long id) {
Order order = orderService.getOrder(id);
order.getItems().size(); // BOOM: LazyInitializationException
// order.items is a proxy — session is already closed
}
}
// ── FIX 1: Fetch join in the query (BEST for known access patterns) ──
@Query("SELECT o FROM Order o LEFT JOIN FETCH o.items WHERE o.id = :id")
Optional<Order> findByIdWithItems(@Param("id") Long id);
// ── FIX 2: Use @EntityGraph (cleaner than JPQL join fetch) ────────────
@EntityGraph(attributePaths = {"items", "items.product", "customer"})
Optional<Order> findById(Long id);
// ── FIX 3: Map to DTO inside the transaction (BEST for APIs) ──────────
@Service
public class OrderService {
@Transactional(readOnly = true)
public OrderDetailDto getOrder(Long id) {
Order order = orderRepository.findByIdWithItems(id).orElseThrow();
return orderMapper.toDetailDto(order);
// DTO created inside transaction — no lazy loading issues ever
}
}
// ── WRONG FIX: spring.jpa.open-in-view=true (default!) ───────────────
// Keeps the session open through the HTTP request lifecycle
// Allows lazy loading in controllers — but silently fires N+1 queries
// ALWAYS set: spring.jpa.open-in-view=false in production
spring.jpa.open-in-view=true is Spring Boot's default. It keeps the Hibernate session open for the entire HTTP request so lazy loading "just works" in controllers. In production this silently fires extra SQL queries at the worst possible time — during response serialization. Always set spring.jpa.open-in-view=false and handle loading explicitly.
The N+1 Problem — Detection & Fix
The N+1 problem is the most common Hibernate performance killer. It silently fires hundreds of SQL queries when you expected one. A page that loads in 50ms with 10 orders starts taking 5 seconds with 1,000 orders — and nobody notices until production.
You run 1 query to get N orders. Then for each order, Hibernate runs 1 query to get its customer. Total: 1 + N queries. With 500 orders, that's 501 SQL statements instead of 1 JOIN query. Each query is fast but the round-trip overhead multiplies catastrophically.
// This looks innocent — but generates N+1 queries
@GetMapping("/orders/summary")
public List<OrderSummaryDto> getOrders() {
List<Order> orders = orderRepository.findAll(); // SELECT * FROM orders (1 query)
return orders.stream()
.map(o -> new OrderSummaryDto(
o.getId(),
o.getCustomer().getName(), // SELECT * FROM customers WHERE id=? (1 per order!)
o.getTotal()
))
.toList();
}
// With 500 orders: 1 + 500 = 501 queries
// With 10,000 orders: 10,001 queries — minutes of wall time
// ── FIX 1: JOIN FETCH ────────────────────────────────────────────────
@Query("SELECT o FROM Order o JOIN FETCH o.customer WHERE o.status = :status")
List<Order> findByStatusWithCustomer(@Param("status") OrderStatus status);
// Generates: SELECT o.*, c.* FROM orders o INNER JOIN customers c ON o.customer_id = c.id
// = 1 query regardless of order count
// ── FIX 2: @EntityGraph ─────────────────────────────────────────────
@EntityGraph(attributePaths = {"customer"})
List<Order> findByStatus(OrderStatus status);
// ── FIX 3: @BatchSize (for collections where join fetch multiplies rows) ──
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
@BatchSize(size = 50) // loads items in batches of 50 IDs: WHERE order_id IN (?,?,?...)
private List<OrderItem> items;
// ── FIX 4: Project to DTO in JPQL — avoids entity loading entirely ────
@Query("""
SELECT new com.myapp.dto.OrderSummaryDto(
o.id, c.name, o.total, o.status, o.createdAt
)
FROM Order o JOIN o.customer c
WHERE o.status = :status
ORDER BY o.createdAt DESC
""")
List<OrderSummaryDto> findSummaryByStatus(@Param("status") OrderStatus status);
Detecting N+1 in Development
// 1. Enable SQL logging + count queries manually (crude but always works)
// logging.level.org.hibernate.SQL: DEBUG
// 2. Hibernate Statistics — counts queries per request
@Configuration
public class HibernateConfig {
@Bean
public HibernatePropertiesCustomizer hibernateStats() {
return props -> props.put("hibernate.generate_statistics", "true");
}
}
// Check log for: "Session Metrics" — shows query count, cache hits, time
// 3. datasource-proxy — logs query count per request with stack trace
// pom: com.github.gavlyukovskiy:datasource-proxy-spring-boot-starter
// application.yml:
// decorator.datasource.datasource-proxy.query.log-level: INFO
// 4. Hypersistence Optimizer — commercial, catches N+1 at test time
// Throws exception in tests when N+1 is detected — CI fails, never ships
// RULE: Always test list endpoints with 100+ items in dev.
// N+1 is invisible with 5 items but catastrophic with 500.
Product entity with a @ManyToOne(fetch = FetchType.EAGER) to Category. You call productRepository.findAll() to get 200 products. How many SQL queries run?@ManyToOne(fetch = EAGER) generates a JOIN in the SELECT query — so still 1 query. But it loads all category fields for all 200 products even if you only need the product name. This is wasteful. LAZY + explicit fetch or DTO projection is better for list endpoints.
Transactions & @Transactional
@Transactional is the most used and most misunderstood annotation in Spring. Most engineers know it "wraps code in a transaction" — but they don't know about propagation, proxy limitations, and the production bugs caused by wrong configuration.
How @Transactional Works Internally
Spring creates a proxy around your bean. When a @Transactional method is called, the proxy intercepts the call, opens a transaction, calls the real method, then commits or rolls back.
Calling a @Transactional method from another method in the same class bypasses the proxy — the transaction annotation is silently ignored. This is one of the most common Spring bugs in production.
// ── SELF-INVOCATION BUG ─────────────────────────────────────────────
@Service
public class OrderService {
public void processOrder(Long id) {
createShipment(id); // BUG: calls through 'this', not the proxy
// @Transactional on createShipment is IGNORED
}
@Transactional // This is NEVER applied when called from above
public void createShipment(Long id) {
// no transaction — changes may not be saved
}
}
// FIX: inject the service into itself, or split into two beans
// ── PROPAGATION ────────────────────────────────────────────────────
@Transactional(propagation = Propagation.REQUIRED) // default: join existing or create new
@Transactional(propagation = Propagation.REQUIRES_NEW) // always create new, suspend outer
@Transactional(propagation = Propagation.SUPPORTS) // join if exists, non-tx if not
@Transactional(propagation = Propagation.MANDATORY) // must have outer tx — throws if not
@Transactional(propagation = Propagation.NEVER) // must NOT be in tx — throws if is
@Transactional(propagation = Propagation.NOT_SUPPORTED) // suspend outer tx, run non-tx
@Transactional(propagation = Propagation.NESTED) // savepoint within outer tx
// REQUIRES_NEW use case: audit log that must always commit even if main tx rolls back
@Service
public class AuditService {
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void logEvent(AuditEvent event) {
auditRepository.save(event); // commits independently of caller's transaction
}
}
// ── READONY OPTIMIZATION ───────────────────────────────────────────
@Transactional(readOnly = true)
public List<OrderDto> findAllOrders() {
// readOnly = true:
// - Hibernate skips dirty checking (no need to track changes for read)
// - Database driver may use read replicas
// - Flush mode set to NEVER (no accidental writes)
// Use for ALL read-only operations — meaningful performance gain
}
// ── ROLLBACK RULES ────────────────────────────────────────────────
@Transactional(rollbackOn = Exception.class) // rollback on any Exception
@Transactional(noRollbackFor = BusinessException.class) // don't rollback for this
// Default: Spring only rolls back on RuntimeException and Error
// Checked exceptions do NOT trigger rollback by default!
Propagation in Practice
@Service
public class PaymentService {
private final AuditService auditService;
@Transactional
public PaymentResult processPayment(PaymentRequest request) {
try {
// This runs in Transaction A
Payment payment = createPayment(request);
chargeCard(payment);
// Audit log in SEPARATE transaction B — commits even if A fails
auditService.logPaymentAttempt(payment, "SUCCESS");
return PaymentResult.success(payment);
} catch (PaymentException e) {
// Transaction A will roll back — payment NOT saved
// But audit log (Transaction B) was already committed
auditService.logPaymentAttempt(null, "FAILED: " + e.getMessage());
throw e;
}
}
}
@Service
public class AuditService {
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void logPaymentAttempt(Payment payment, String result) {
// Always commits independently — critical for audit trails
auditRepository.save(new AuditEvent(payment, result, Instant.now()));
}
}
Isolation Levels & Locking
Isolation levels are the most underengineered aspect of database usage in Spring Boot applications. Using the wrong level causes data corruption. Using too strong a level destroys throughput. Every senior engineer must understand this.
// READ_UNCOMMITTED — can read uncommitted changes from other transactions
// Problem: dirty reads. Almost never use this.
@Transactional(isolation = Isolation.READ_UNCOMMITTED)
// READ_COMMITTED — only reads committed data (default in PostgreSQL/Oracle)
// Prevents dirty reads. Still allows non-repeatable reads and phantom reads.
// Good default for most applications.
@Transactional(isolation = Isolation.READ_COMMITTED) // use this for most reads
// REPEATABLE_READ — same row read twice gives same result (default MySQL InnoDB)
// Prevents dirty + non-repeatable reads. Still allows phantom reads.
// Higher overhead — use for financial calculations that read the same data twice.
@Transactional(isolation = Isolation.REPEATABLE_READ)
// SERIALIZABLE — full isolation, transactions appear sequential
// Prevents all anomalies. Severe performance impact — lock contention.
// Use only for critical financial operations (bank transfers, inventory allocation).
@Transactional(isolation = Isolation.SERIALIZABLE)
# DIRTY READ — reading uncommitted changes from another transaction
# Tx A writes balance=900 (not committed)
# Tx B reads balance=900
# Tx A rolls back — real balance is 1000
# Tx B acted on wrong data
# NON-REPEATABLE READ — same row, different value on second read
# Tx A reads order.status = PENDING
# Tx B updates order.status = SHIPPED (commits)
# Tx A reads order.status again = SHIPPED
# Decision based on first read is now wrong
# PHANTOM READ — same query, different rows on second run
# Tx A queries: SELECT COUNT(*) FROM seats WHERE available=true → 5
# Tx B inserts a new available seat (commits)
# Tx A queries again → 6 (phantom row appeared)
# Relevant for: inventory checks, capacity planning
# LOST UPDATE — two transactions overwrite each other
# Tx A reads account balance: 1000
# Tx B reads account balance: 1000
# Tx A writes: balance = 1000 - 100 = 900 (commits)
# Tx B writes: balance = 1000 - 200 = 800 (commits)
# Expected: 700. Actual: 800. Tx A's write was lost.
// Optimistic locking: no DB locks. Check version on UPDATE.
// If version changed since we read it, throw exception.
// Best for: low-contention scenarios, long-running user workflows
@Entity
public class Product {
@Id private Long id;
@Version // Hibernate manages this automatically
private Long version;
private int stockCount;
}
// Hibernate generates:
// UPDATE products SET stock_count=?, version=version+1
// WHERE id=? AND version=? ← if version mismatch, 0 rows updated → exception
@Service
@Transactional
public class InventoryService {
public void deductStock(Long productId, int quantity) {
Product product = productRepository.findById(productId).orElseThrow();
if (product.getStockCount() < quantity) {
throw new InsufficientStockException();
}
product.setStockCount(product.getStockCount() - quantity);
// If another thread updated stock between our read and write:
// → OptimisticLockingFailureException thrown
// → Caller should retry the operation
}
}
// Pessimistic locking: lock the row in the DB. Others wait.
// Best for: high-contention scenarios, guaranteed exclusivity needed
// PESSIMISTIC_WRITE: SELECT ... FOR UPDATE (blocks other reads AND writes)
@Lock(LockModeType.PESSIMISTIC_WRITE)
@Query("SELECT p FROM Product p WHERE p.id = :id")
Optional<Product> findByIdForUpdate(@Param("id") Long id);
// PESSIMISTIC_READ: SELECT ... FOR SHARE (blocks writes, allows other reads)
@Lock(LockModeType.PESSIMISTIC_READ)
Optional<Product> findById(Long id);
// Usage
@Transactional
public void allocateStock(Long productId, int qty) {
// Row is locked for the entire transaction
Product product = productRepository.findByIdForUpdate(productId).orElseThrow();
if (product.getStockCount() < qty) throw new InsufficientStockException();
product.setStockCount(product.getStockCount() - qty);
// Lock released on commit
}
// Warning: pessimistic locks can cause DEADLOCKS if multiple transactions
// acquire locks in different order. Always lock in a consistent order.
JPQL, Criteria API & Native Queries
Spring Data gives you multiple ways to query the database. Each has a different sweet spot — understanding when to use which prevents both over-engineering and under-powered queries.
// Spring Data generates SQL from method names — great for simple queries
public interface OrderRepository extends JpaRepository<Order, Long> {
// SELECT * FROM orders WHERE status = ?
List<Order> findByStatus(OrderStatus status);
// SELECT * FROM orders WHERE customer_id = ? AND status = ?
List<Order> findByCustomerIdAndStatus(Long customerId, OrderStatus status);
// SELECT * FROM orders WHERE total > ?
List<Order> findByTotalGreaterThan(BigDecimal amount);
// SELECT * FROM orders WHERE created_at BETWEEN ? AND ?
List<Order> findByCreatedAtBetween(LocalDateTime from, LocalDateTime to);
// SELECT * FROM orders WHERE notes LIKE %?%
List<Order> findByNotesContainingIgnoreCase(String keyword);
// With sorting and pagination
Page<Order> findByStatus(OrderStatus status, Pageable pageable);
// Existence check (SELECT COUNT(*) > 0)
boolean existsByCustomerIdAndStatus(Long customerId, OrderStatus status);
// Count
long countByStatus(OrderStatus status);
// Delete
@Modifying
@Transactional
void deleteByStatusAndCreatedAtBefore(OrderStatus status, LocalDateTime before);
}
public interface OrderRepository extends JpaRepository<Order, Long> {
// JOIN FETCH to avoid N+1
@Query("SELECT o FROM Order o JOIN FETCH o.customer JOIN FETCH o.items WHERE o.id = :id")
Optional<Order> findByIdWithDetails(@Param("id") Long id);
// DTO projection — only loads needed fields
@Query("""
SELECT new com.myapp.dto.OrderSummaryDto(
o.id, c.name, o.total, o.status, o.createdAt
)
FROM Order o JOIN o.customer c
WHERE o.status = :status
ORDER BY o.createdAt DESC
""")
List<OrderSummaryDto> findSummaryByStatus(@Param("status") OrderStatus status);
// Bulk update — much faster than loading + updating entities one by one
@Modifying
@Transactional
@Query("UPDATE Order o SET o.status = :newStatus WHERE o.status = :oldStatus AND o.createdAt < :before")
int bulkUpdateStatus(@Param("oldStatus") OrderStatus old,
@Param("newStatus") OrderStatus newStatus,
@Param("before") LocalDateTime before);
// Pagination with count query optimized separately
@Query(value = "SELECT o FROM Order o JOIN FETCH o.customer WHERE o.status = :status",
countQuery = "SELECT COUNT(o) FROM Order o WHERE o.status = :status")
Page<Order> findByStatusWithCustomer(@Param("status") OrderStatus status, Pageable pageable);
}
// Use native SQL for: database-specific features, complex analytics,
// window functions, CTEs, full-text search, JSONB queries
@Query(value = """
WITH ranked_orders AS (
SELECT o.*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders o
WHERE status = :status
)
SELECT * FROM ranked_orders WHERE rn = 1
""", nativeQuery = true)
List<Object[]> findLatestOrderPerCustomer(@Param("status") String status);
// Map to interface projection (cleaner than Object[])
@Query(value = """
SELECT o.id, o.total, c.name AS customer_name,
COUNT(oi.id) AS item_count
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON oi.order_id = o.id
WHERE o.created_at >= :from
GROUP BY o.id, o.total, c.name
ORDER BY o.total DESC
""", nativeQuery = true)
List<OrderReportProjection> getOrderReport(@Param("from") LocalDateTime from);
// Interface projection — Spring creates a proxy
public interface OrderReportProjection {
Long getId();
BigDecimal getTotal();
String getCustomerName();
int getItemCount();
}
// Interface projection — Spring generates proxy, only selected columns loaded
public interface ProductSummary {
Long getId();
String getName();
BigDecimal getPrice();
// Computed expression
@Value("#{target.price * 1.2}")
BigDecimal getPriceWithTax();
}
// Spring Data automatically projects to interface
List<ProductSummary> findByCategory(String category);
// SELECT id, name, price FROM products WHERE category = ?
// (NOT SELECT * — only the columns used in the projection)
// Record projection (Hibernate 6+ / Spring Boot 3+)
public record ProductSummaryDto(Long id, String name, BigDecimal price) {}
// Works with derived queries and @Query
List<ProductSummaryDto> findByCategory(String category);
Connection Pooling with HikariCP
Every database interaction requires a connection. Creating a new TCP connection per request would be catastrophic — it takes 20–100ms. Connection pools maintain a set of open connections and hand them out on demand. HikariCP is Spring Boot's default and the fastest Java connection pool available.
A common misconception: "more connections = better performance." The opposite is often true. Database servers have limited resources. Too many connections cause context-switching overhead and memory pressure on the DB. The right pool size depends on your workload — too small causes timeouts, too large causes DB overload.
spring:
datasource:
hikari:
# Pool size — formula: (core_count * 2) + effective_spindle_count
# For I/O-bound workloads: 10-20 is often ideal
maximum-pool-size: 20
minimum-idle: 5 # keep 5 warm connections always
# Timeout settings
connection-timeout: 30000 # 30s max wait to get a connection from pool
idle-timeout: 600000 # 10min — remove idle connections
max-lifetime: 1800000 # 30min — recycle connections (prevent stale)
keepalive-time: 60000 # ping idle connections every 60s
# Validation
connection-test-query: SELECT 1 # for MySQL (Postgres uses ping by default)
# Pool name — appears in logs and JMX
pool-name: OrderServicePool
# Leak detection — logs stack trace if connection held > this duration
leak-detection-threshold: 5000 # 5 seconds
Connection Pool Exhaustion
When all pool connections are in use, new requests wait up to connection-timeout then fail with SQLTimeoutException: Unable to acquire JDBC Connection. This cascades: slow DB queries hold connections longer → more requests queue → server appears down. Root causes: slow queries, N+1 problems, transactions left open too long, or pool sized too small for traffic.
// HikariCP exposes metrics via Micrometer — view in Actuator
// GET /actuator/metrics/hikaricp.connections.active
// GET /actuator/metrics/hikaricp.connections.pending
// GET /actuator/metrics/hikaricp.connections.idle
// In application.yml, expose via Prometheus:
// management.endpoints.web.exposure.include: metrics, prometheus
// Key metrics to alert on:
// hikaricp.connections.pending > 0 — connections are being waited for
// hikaricp.connections.active / hikaricp.connections.max > 0.8 — 80% utilization
// hikaricp.connections.timeout.total increasing — requests timing out
// Programmatic access
@Component
public class PoolHealthChecker {
@Autowired HikariDataSource dataSource;
public PoolStats getStats() {
HikariPoolMXBean pool = dataSource.getHikariPoolMXBean();
return new PoolStats(
pool.getActiveConnections(),
pool.getIdleConnections(),
pool.getThreadsAwaitingConnection(),
pool.getTotalConnections()
);
}
}
Schema Migrations — Flyway & Liquibase
Never use hibernate.ddl-auto=create or update in anything beyond local development. Production schemas must be managed with version-controlled migration scripts that are repeatable, auditable, and reversible. Flyway and Liquibase are the two industry standards.
Flyway
SQL-first migrations. Simple, explicit, easy to audit. Each migration is a .sql file with a version number. Hard to undo — design for forward-only changes.
Liquibase
XML/YAML/JSON changesets with built-in rollback support. More complex but supports undo operations and multi-database abstraction. Preferred for large enterprise teams.
spring:
flyway:
enabled: true
locations: classpath:db/migration # where migration files live
baseline-on-migrate: true # allow migrating existing DBs
validate-on-migrate: true # validate checksums on startup
out-of-order: false # strict version ordering
jpa:
hibernate:
ddl-auto: validate # Hibernate validates against migrated schema
src/main/resources/db/migration/
V1__create_customers_table.sql
V2__create_orders_table.sql
V3__add_order_status_index.sql
V4__add_customer_email_unique.sql
V5__add_products_table.sql
V6__add_shipping_address_to_orders.sql
-- V2__create_orders_table.sql
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(id),
status VARCHAR(20) NOT NULL DEFAULT 'PENDING',
total NUMERIC(12, 2) NOT NULL DEFAULT 0,
notes TEXT,
version BIGINT NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);
-- V3__add_order_items_table.sql
CREATE TABLE order_items (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
product_id BIGINT NOT NULL REFERENCES products(id),
quantity INT NOT NULL CHECK (quantity > 0),
unit_price NUMERIC(12, 2) NOT NULL
);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
Flyway checksums each migration file. If you modify a file that's already been applied to any environment, Flyway will refuse to start with a checksum mismatch error. Always create a new migration to fix a previous one. This is not a limitation — it's an audit trail.
For live production deployments: (1) Add columns as nullable first — existing rows won't fail. (2) Never remove columns in the same release that removes code using them — do it in the next release. (3) Add indexes CONCURRENTLY in PostgreSQL to avoid table locks. (4) Rename columns in 3 steps: add new column, dual-write both, remove old column.
Query Optimization & Indexing
A slow query in production is a crisis. Understanding how to diagnose, index, and rewrite slow queries is one of the highest-value skills a backend engineer can have.
-- Always use EXPLAIN ANALYZE in PostgreSQL to diagnose slow queries
EXPLAIN ANALYZE
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'PENDING'
ORDER BY o.created_at DESC
LIMIT 20;
-- Look for:
-- "Seq Scan" on large tables → missing index
-- "Rows Removed by Filter: 50000" → poor selectivity
-- high "actual time" → bottleneck found
-- "Nested Loop" with large outer rows → join optimization needed
-- Bad output (no index):
-- Seq Scan on orders (cost=0.00..45231.00 rows=123 width=200)
-- (actual time=0.1..2340.5 rows=123 loops=1)
-- Filter: (status = 'PENDING')
-- Rows Removed by Filter: 500000
-- Good output (with index):
-- Index Scan using idx_orders_status_created on orders
-- (actual time=0.05..1.2 rows=123 loops=1)
-- Index Cond: (status = 'PENDING')
Indexing Strategy
-- ALWAYS INDEX: foreign keys (Postgres doesn't auto-index them like MySQL)
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Composite index — column ORDER matters. Left-prefix rule.
-- This index supports: WHERE status=? AND created_at>?
-- WHERE status=?
-- NOT: WHERE created_at>? (status must be first)
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);
-- Partial index — smaller, faster for filtered queries
CREATE INDEX idx_orders_pending ON orders(created_at DESC)
WHERE status = 'PENDING';
-- Only indexes pending orders — much smaller than a full index
-- Covering index — includes extra columns to avoid table lookups
CREATE INDEX idx_orders_customer_covering
ON orders(customer_id) INCLUDE (status, total, created_at);
-- Query can be answered from index alone — no heap access
-- CONCURRENTLY — add index without locking the table (Postgres)
CREATE INDEX CONCURRENTLY idx_orders_new ON orders(email);
-- Takes longer but production traffic is not blocked
Every index slows down INSERT/UPDATE/DELETE operations because the index must be maintained. A table with 15 indexes takes 3–5× longer to insert into than one with 2. Index only columns that appear in WHERE clauses of frequent queries, join conditions, and ORDER BY on large result sets. Drop unused indexes — PostgreSQL's pg_stat_user_indexes shows index usage statistics.
When NOT to Use JPA
JPA is powerful but not the right tool for every job. Knowing when to bypass it is a sign of engineering maturity. Forcing everything through Hibernate can be 10–100× slower than well-written SQL for certain workloads.
JPA loads entities into memory, runs dirty checking, generates individual statements. Use Spring's JdbcTemplate.batchUpdate() or a COPY command for bulk operations — 10–100× faster.
Window functions, CTEs, GROUP BY with HAVING, ROLLUP — these are natural SQL but awkward or impossible in JPQL. Use @Query(nativeQuery=true) or JdbcTemplate for analytics queries.
Loading a full entity just to check one column is wasteful. Use a scalar projection: @Query("SELECT o.status FROM Order o WHERE o.id = :id") or an interface projection.
PostgreSQL's JSONB operators, tsvector full-text search, and array types don't map cleanly to JPA. Use native queries or Spring Data's custom query support.
@Service
@RequiredArgsConstructor
public class BulkImportService {
private final JdbcTemplate jdbcTemplate;
// Insert 100,000 rows efficiently
public void bulkInsertProducts(List<ProductImportDto> products) {
String sql = "INSERT INTO products (name, sku, price, category) VALUES (?,?,?,?)";
jdbcTemplate.batchUpdate(sql, products, 1000, (ps, product) -> {
ps.setString(1, product.name());
ps.setString(2, product.sku());
ps.setBigDecimal(3, product.price());
ps.setString(4, product.category());
});
// Sends 100 batches of 1000 rows each — MUCH faster than 100,000 individual inserts
}
// Complex analytics query — not possible in JPQL
public List<RevenueByCategory> getMonthlyRevenue(int year) {
return jdbcTemplate.query("""
SELECT c.name AS category,
EXTRACT(MONTH FROM o.created_at) AS month,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON oi.product_id = p.id
JOIN categories c ON p.category_id = c.id
WHERE EXTRACT(YEAR FROM o.created_at) = ?
AND o.status = 'COMPLETED'
GROUP BY c.name, month
ORDER BY month, revenue DESC
""", (rs, row) -> new RevenueByCategory(
rs.getString("category"),
rs.getInt("month"),
rs.getBigDecimal("revenue")
), year);
}
}
@Transactional service method that loads an entity, modifies a field, but never calls save(). What happens?save() call needed. This is automatic dirty checking and is one of JPA's most important features.
Interview Questions
spring.jpa.open-in-view=false, map entities to DTOs inside the @Transactional service, and use explicit fetch joins for needed associations.@Transactional(rollbackOn = Exception.class) or throw a RuntimeException wrapper.@Version) assumes conflicts are rare — it doesn't lock rows, instead checking the version number on UPDATE. If the version changed since we read it, it throws OptimisticLockingFailureException and the caller retries. Best for: user-facing workflows with low conflict probability. Pessimistic locking (SELECT FOR UPDATE) locks the row in the database immediately, making other transactions wait. Best for: inventory reservation, financial operations where you cannot allow concurrent modification. Pessimistic locking risks deadlocks if transactions acquire locks in different orders.ddl-auto=update only adds columns and tables — it never removes them or changes column types safely. It can't handle data migrations. It executes DDL in the application startup transaction, which blocks startup. It has no version history or audit trail. Most critically, it's unpredictable on concurrent deployments with multiple instances. Use Flyway or Liquibase: explicit, versioned, auditable migrations that run exactly once and fail loudly if something goes wrong.