Database Engineering — Spring Boot Engineering Platform

JPA & Hibernate Internals

JPA (Java Persistence API) is a specification — a set of interfaces. Hibernate is the most popular implementation. Spring Data JPA wraps Hibernate with repository abstractions. Understanding this three-layer stack is essential: when something goes wrong, you need to know which layer to debug.

📋

JPA (Specification)

Defines interfaces: EntityManager, @Entity, @Table, JPQL syntax. A contract — no implementation code.

⚙️

Hibernate (ORM)

Implements JPA. Handles SQL generation, caching, dirty checking, proxies, and schema generation. The actual engine doing the work.

🚀

Spring Data JPA

Wraps Hibernate with JpaRepository, derived query methods, pagination, and auto-transaction management. Reduces boilerplate dramatically.

What Hibernate Does For You — And What It Hides

The ORM Abstraction Tradeoff

Hibernate hides SQL behind objects, which speeds development — but it also hides how many SQL statements run, when they run, and what indexes they use. Engineers who treat Hibernate as a black box eventually ship slow applications. Always watch the SQL Hibernate generates.

Enable SQL Logging — Always On in Dev

spring:
  jpa:
    show-sql: true                          # prints SQL to stdout
    properties:
      hibernate:
        format_sql: true                    # pretty-print SQL
        use_sql_comments: true              # adds HQL source as comment

logging:
  level:
    org.hibernate.SQL: DEBUG                # see all SQL
    org.hibernate.type.descriptor.sql: TRACE  # see bind parameter values

The Persistence Context Explained

The Persistence Context is Hibernate's first-level cache and change-tracking unit. It lives for the duration of a @Transactional method. Every entity you load becomes "managed" — Hibernate tracks every field change and flushes them to the database at the right time.

Persistence Context Lifecycle

@Transactional starts

Persistence Context opens — EntityManager created

↓

find() / query

Entity loaded → becomes MANAGED (tracked for changes)

↓

entity.setName()

Dirty flag set — no SQL yet, change is buffered

↓

Flush / commit

Hibernate runs dirty checking → generates UPDATE SQL → sends to DB

↓

@Transactional ends

Persistence Context closes — entities become DETACHED

Entity Mapping — Done Right

Poor entity mapping is the root cause of many Hibernate problems — wrong fetch types, missing indexes, bad ID generation strategies. Get this right from day one.

Production-Grade Entity

@Entity
@Table(
    name = "orders",
    indexes = {
        @Index(name = "idx_orders_customer_id", columnList = "customer_id"),
        @Index(name = "idx_orders_status_created", columnList = "status, created_at"),
        @Index(name = "idx_orders_created_at", columnList = "created_at DESC")
    }
)
@EntityListeners(AuditingEntityListener.class)  // Spring Data auditing
public class Order {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE,
                    generator = "order_seq")
    @SequenceGenerator(name = "order_seq", sequenceName = "order_id_seq",
                       allocationSize = 50)  // batch 50 IDs per DB call
    private Long id;

    // Use enums — stored as strings, not numbers (never change meaning)
    @Enumerated(EnumType.STRING)
    @Column(nullable = false, length = 20)
    private OrderStatus status;

    @Column(nullable = false, precision = 12, scale = 2)
    private BigDecimal total;

    // Many-to-one: always LAZY — never load the whole customer for every order query
    @ManyToOne(fetch = FetchType.LAZY, optional = false)
    @JoinColumn(name = "customer_id", nullable = false)
    private Customer customer;

    // One-to-many: always LAZY
    @OneToMany(mappedBy = "order", cascade = CascadeType.ALL,
               orphanRemoval = true, fetch = FetchType.LAZY)
    private List<OrderItem> items = new ArrayList<>();

    // Audit fields — populated automatically by Spring Data
    @CreatedDate
    @Column(nullable = false, updatable = false)
    private LocalDateTime createdAt;

    @LastModifiedDate
    @Column(nullable = false)
    private LocalDateTime updatedAt;

    // Optimistic locking — prevents lost updates
    @Version
    private Long version;

    // Equals/hashCode based on business key — NEVER on id alone for new entities
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Order)) return false;
        Order other = (Order) o;
        return id != null && id.equals(other.getId());
    }

    @Override
    public int hashCode() { return getClass().hashCode(); }
}

ID Generation Strategies

SEQUENCE is the best strategy for PostgreSQL. Hibernate pre-fetches IDs in batches (allocationSize), meaning one DB call per 50 inserts. Supports batch inserts efficiently.

Sequence Strategy

@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "order_seq")
@SequenceGenerator(name = "order_seq", sequenceName = "order_id_seq",
                   allocationSize = 50)  // fetch 50 IDs at once — 1 DB roundtrip per 50 inserts
private Long id;

IDENTITY uses database auto-increment (AUTO_INCREMENT in MySQL, SERIAL in PostgreSQL). Simple but disables JDBC batch inserts — Hibernate needs to know the ID immediately after each insert.

IDENTITY Kills Batch Performance

With IDENTITY, Hibernate cannot batch INSERT statements because it needs the generated ID from each insert before it can continue. If you're doing bulk inserts, use SEQUENCE instead.

UUID is ideal for distributed systems where IDs must be globally unique without database coordination. Downside: larger storage (16 bytes vs 8), random insertion order causes B-tree index fragmentation.

UUID Strategy

@Id
@GeneratedValue(strategy = GenerationType.UUID)  // Hibernate 6+
private UUID id;

// Or for sequential UUIDs (better index performance):
@Id
@UuidGenerator(style = UuidGenerator.Style.TIME)  // time-ordered UUID v7
private UUID id;

Never Use TABLE Strategy in Production

GenerationType.TABLE uses a dedicated database table for ID generation with a pessimistic lock on every insert. This creates a single-row bottleneck that serializes all inserts across your entire application. It will destroy throughput under any meaningful load.

Entity Lifecycle & Persistence Context

Every JPA entity exists in one of four states. Knowing which state your entity is in prevents the most common Hibernate bugs: unexpected updates, detached entity exceptions, and missing data.

🆕

TRANSIENT

New object, not associated with any persistence context. No DB row. Hibernate doesn't know it exists. new Order() is transient.

👁️

MANAGED

Associated with an open persistence context. Hibernate tracks all changes. Any field mutation will produce an UPDATE on flush.

🔌

DETACHED

Was managed, now the persistence context closed. Has a DB row but changes are no longer tracked. Common after @Transactional method returns.

🗑️

REMOVED

Scheduled for deletion. Still in the persistence context, will be DELETE-d on flush/commit.

Entity State Transitions

@Service
@Transactional
public class OrderService {

    public void demonstrateLifecycle(Long orderId) {
        // TRANSIENT — not yet known to Hibernate
        Order newOrder = new Order();
        newOrder.setStatus(OrderStatus.PENDING);

        // TRANSIENT → MANAGED (scheduled for INSERT on flush)
        entityManager.persist(newOrder);

        // MANAGED — load from DB, entity tracked by persistence context
        Order existing = orderRepository.findById(orderId).orElseThrow();

        // Still MANAGED — this change is tracked, UPDATE will run on flush
        existing.setStatus(OrderStatus.PROCESSING);
        // NO save() call needed — dirty checking handles it automatically!

        // MANAGED → REMOVED (DELETE on flush)
        entityManager.remove(existing);

        // DETACHED entity re-attached — Hibernate does a SELECT then UPDATE
        Order detached = getDetachedOrderFromSomewhere();
        Order reattached = entityManager.merge(detached);
    }
    // @Transactional ends → flush → SQL executes → context closes → all entities DETACHED
}

// COMMON BUG: modifying a DETACHED entity
@Service
public class BuggyService {
    public Order getOrder(Long id) {
        return orderRepository.findById(id).orElseThrow();
        // Returns DETACHED entity — @Transactional ended when method returned
    }

    public void updateOrder(Order order) {
        order.setStatus(OrderStatus.SHIPPED);
        // BUG: order is DETACHED — this change is SILENTLY IGNORED
        // You must call orderRepository.save(order) to re-attach and update
    }
}

Relationships — OneToMany, ManyToOne, ManyToMany

JPA relationships are powerful but carry hidden costs. Every relationship annotation is a decision about SQL joins, memory usage, and query complexity. Map them wrong and you'll spend months fighting performance issues.

@OneToMany — Bidirectional

// Parent side — use mappedBy to avoid a join table
@Entity
public class Order {
    @OneToMany(
        mappedBy = "order",        // field name in OrderItem that owns the FK
        cascade = CascadeType.ALL, // persist/merge/remove propagates to items
        orphanRemoval = true,      // delete item when removed from collection
        fetch = FetchType.LAZY     // ALWAYS LAZY for collections
    )
    private List<OrderItem> items = new ArrayList<>();

    // Helper methods to keep both sides in sync
    public void addItem(OrderItem item) {
        items.add(item);
        item.setOrder(this);
    }
    public void removeItem(OrderItem item) {
        items.remove(item);
        item.setOrder(null);
    }
}

// Child side — owns the foreign key column
@Entity
public class OrderItem {
    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "order_id", nullable = false)
    private Order order;
}

@ManyToOne — Always LAZY

// ManyToOne is EAGER by default — change it immediately
@ManyToOne(fetch = FetchType.LAZY)  // ALWAYS override to LAZY
@JoinColumn(name = "customer_id", nullable = false)
private Customer customer;

// WHY: if you load 100 orders with EAGER customer,
// Hibernate loads 100 customers too — even if you never use them.
// LAZY means: Customer proxy created, actual SELECT only when you call order.getCustomer().getName()

@ManyToMany — Use with Caution

// ManyToMany uses a join table automatically
@Entity
public class Product {
    @ManyToMany(fetch = FetchType.LAZY)
    @JoinTable(
        name = "product_tags",
        joinColumns = @JoinColumn(name = "product_id"),
        inverseJoinColumns = @JoinColumn(name = "tag_id")
    )
    private Set<Tag> tags = new HashSet<>();
    // Use Set not List for ManyToMany — avoids duplicate join table rows
}

// Better approach for ManyToMany with extra columns on the join table:
// Create an explicit junction entity (ProductTag) with its own @Id
// This gives you full control over the join table and its data

Cascade Types — What They Mean

// CascadeType.PERSIST — when you persist parent, children are persisted
// CascadeType.MERGE   — when you merge parent, children are merged
// CascadeType.REMOVE  — when you delete parent, children are deleted
// CascadeType.REFRESH — when you refresh parent, children are refreshed
// CascadeType.DETACH  — when parent detaches, children detach
// CascadeType.ALL     — all of the above

// SAFE: cascade ALL for true parent-child (Order → OrderItems)
@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
private List<OrderItem> items;

// DANGEROUS: cascade REMOVE for shared entities
// Deleting an Order should NOT delete the Customer!
@ManyToOne(cascade = CascadeType.REMOVE)  // WRONG — deletes customer!
private Customer customer;

// RULE: CascadeType.ALL only for private ownership relationships
// Never cascade to entities that can be referenced by other parents

Lazy vs Eager Loading

This is one of the most misunderstood topics in JPA. Getting the defaults wrong causes either LazyInitializationException in production or silent performance disasters where Hibernate loads megabytes of data you never use.

😴

LAZY (default for collections)

A proxy is returned. The actual SQL runs only when you first access the collection. Zero cost if you never access it. Throws LazyInitializationException if accessed outside a transaction.

⚡

EAGER (default for @ManyToOne)

Related data is loaded in the same query. Always costs a JOIN, even if you never use the data. Fine for single objects, catastrophic for collections.

The LazyInitializationException

Most Common JPA Exception in Production

LazyInitializationException: could not initialize proxy — no Session means you tried to access a lazy collection after the Hibernate session (persistence context) was closed. This typically happens when you return an entity from a @Transactional service method and then access its lazy associations in the controller or view layer.

LazyInitializationException — Cause & Fixes

// ── THE PROBLEM ──────────────────────────────────────────────
@Service
public class OrderService {
    @Transactional
    public Order getOrder(Long id) {
        return orderRepository.findById(id).orElseThrow();
        // @Transactional ends here — session CLOSES
    }
}

@RestController
public class OrderController {
    public OrderDto getOrder(Long id) {
        Order order = orderService.getOrder(id);
        order.getItems().size(); // BOOM: LazyInitializationException
        // order.items is a proxy — session is already closed
    }
}

// ── FIX 1: Fetch join in the query (BEST for known access patterns) ──
@Query("SELECT o FROM Order o LEFT JOIN FETCH o.items WHERE o.id = :id")
Optional<Order> findByIdWithItems(@Param("id") Long id);

// ── FIX 2: Use @EntityGraph (cleaner than JPQL join fetch) ────────────
@EntityGraph(attributePaths = {"items", "items.product", "customer"})
Optional<Order> findById(Long id);

// ── FIX 3: Map to DTO inside the transaction (BEST for APIs) ──────────
@Service
public class OrderService {
    @Transactional(readOnly = true)
    public OrderDetailDto getOrder(Long id) {
        Order order = orderRepository.findByIdWithItems(id).orElseThrow();
        return orderMapper.toDetailDto(order);
        // DTO created inside transaction — no lazy loading issues ever
    }
}

// ── WRONG FIX: spring.jpa.open-in-view=true (default!) ───────────────
// Keeps the session open through the HTTP request lifecycle
// Allows lazy loading in controllers — but silently fires N+1 queries
// ALWAYS set: spring.jpa.open-in-view=false in production

Turn Off Open Session In View

spring.jpa.open-in-view=true is Spring Boot's default. It keeps the Hibernate session open for the entire HTTP request so lazy loading "just works" in controllers. In production this silently fires extra SQL queries at the worst possible time — during response serialization. Always set spring.jpa.open-in-view=false and handle loading explicitly.

The N+1 Problem — Detection & Fix

The N+1 problem is the most common Hibernate performance killer. It silently fires hundreds of SQL queries when you expected one. A page that loads in 50ms with 10 orders starts taking 5 seconds with 1,000 orders — and nobody notices until production.

What Is N+1?

You run 1 query to get N orders. Then for each order, Hibernate runs 1 query to get its customer. Total: 1 + N queries. With 500 orders, that's 501 SQL statements instead of 1 JOIN query. Each query is fast but the round-trip overhead multiplies catastrophically.

N+1 — The Classic Example

// This looks innocent — but generates N+1 queries
@GetMapping("/orders/summary")
public List<OrderSummaryDto> getOrders() {
    List<Order> orders = orderRepository.findAll();  // SELECT * FROM orders  (1 query)

    return orders.stream()
        .map(o -> new OrderSummaryDto(
            o.getId(),
            o.getCustomer().getName(),    // SELECT * FROM customers WHERE id=?  (1 per order!)
            o.getTotal()
        ))
        .toList();
}
// With 500 orders: 1 + 500 = 501 queries
// With 10,000 orders: 10,001 queries — minutes of wall time

// ── FIX 1: JOIN FETCH ────────────────────────────────────────────────
@Query("SELECT o FROM Order o JOIN FETCH o.customer WHERE o.status = :status")
List<Order> findByStatusWithCustomer(@Param("status") OrderStatus status);
// Generates: SELECT o.*, c.* FROM orders o INNER JOIN customers c ON o.customer_id = c.id
// = 1 query regardless of order count

// ── FIX 2: @EntityGraph ─────────────────────────────────────────────
@EntityGraph(attributePaths = {"customer"})
List<Order> findByStatus(OrderStatus status);

// ── FIX 3: @BatchSize (for collections where join fetch multiplies rows) ──
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
@BatchSize(size = 50)  // loads items in batches of 50 IDs: WHERE order_id IN (?,?,?...)
private List<OrderItem> items;

// ── FIX 4: Project to DTO in JPQL — avoids entity loading entirely ────
@Query("""
    SELECT new com.myapp.dto.OrderSummaryDto(
        o.id, c.name, o.total, o.status, o.createdAt
    )
    FROM Order o JOIN o.customer c
    WHERE o.status = :status
    ORDER BY o.createdAt DESC
""")
List<OrderSummaryDto> findSummaryByStatus(@Param("status") OrderStatus status);

Detecting N+1 in Development

Tools to Catch N+1 Before Production

// 1. Enable SQL logging + count queries manually (crude but always works)
// logging.level.org.hibernate.SQL: DEBUG

// 2. Hibernate Statistics — counts queries per request
@Configuration
public class HibernateConfig {
    @Bean
    public HibernatePropertiesCustomizer hibernateStats() {
        return props -> props.put("hibernate.generate_statistics", "true");
    }
}
// Check log for: "Session Metrics" — shows query count, cache hits, time

// 3. datasource-proxy — logs query count per request with stack trace
// pom: com.github.gavlyukovskiy:datasource-proxy-spring-boot-starter
// application.yml:
// decorator.datasource.datasource-proxy.query.log-level: INFO

// 4. Hypersistence Optimizer — commercial, catches N+1 at test time
// Throws exception in tests when N+1 is detected — CI fails, never ships

// RULE: Always test list endpoints with 100+ items in dev.
// N+1 is invisible with 5 items but catastrophic with 500.

Quick Check

You have a Product entity with a @ManyToOne(fetch = FetchType.EAGER) to Category. You call productRepository.findAll() to get 200 products. How many SQL queries run?

200 — one per product to load with its category

201 — 1 for products + 200 for categories

1 — EAGER on @ManyToOne generates a JOIN in the same query (but still loads all category data even if unused)

It depends on the database

Correct. @ManyToOne(fetch = EAGER) generates a JOIN in the SELECT query — so still 1 query. But it loads all category fields for all 200 products even if you only need the product name. This is wasteful. LAZY + explicit fetch or DTO projection is better for list endpoints.

Transactions & @Transactional

@Transactional is the most used and most misunderstood annotation in Spring. Most engineers know it "wraps code in a transaction" — but they don't know about propagation, proxy limitations, and the production bugs caused by wrong configuration.

How @Transactional Works Internally

Spring creates a proxy around your bean. When a @Transactional method is called, the proxy intercepts the call, opens a transaction, calls the real method, then commits or rolls back.

The Self-Invocation Trap

Calling a @Transactional method from another method in the same class bypasses the proxy — the transaction annotation is silently ignored. This is one of the most common Spring bugs in production.

@Transactional — Full Reference

// ── SELF-INVOCATION BUG ─────────────────────────────────────────────
@Service
public class OrderService {

    public void processOrder(Long id) {
        createShipment(id);   // BUG: calls through 'this', not the proxy
                               // @Transactional on createShipment is IGNORED
    }

    @Transactional  // This is NEVER applied when called from above
    public void createShipment(Long id) {
        // no transaction — changes may not be saved
    }
}
// FIX: inject the service into itself, or split into two beans

// ── PROPAGATION ────────────────────────────────────────────────────
@Transactional(propagation = Propagation.REQUIRED)      // default: join existing or create new
@Transactional(propagation = Propagation.REQUIRES_NEW)  // always create new, suspend outer
@Transactional(propagation = Propagation.SUPPORTS)      // join if exists, non-tx if not
@Transactional(propagation = Propagation.MANDATORY)     // must have outer tx — throws if not
@Transactional(propagation = Propagation.NEVER)         // must NOT be in tx — throws if is
@Transactional(propagation = Propagation.NOT_SUPPORTED) // suspend outer tx, run non-tx
@Transactional(propagation = Propagation.NESTED)        // savepoint within outer tx

// REQUIRES_NEW use case: audit log that must always commit even if main tx rolls back
@Service
public class AuditService {
    @Transactional(propagation = Propagation.REQUIRES_NEW)
    public void logEvent(AuditEvent event) {
        auditRepository.save(event);  // commits independently of caller's transaction
    }
}

// ── READONY OPTIMIZATION ───────────────────────────────────────────
@Transactional(readOnly = true)
public List<OrderDto> findAllOrders() {
    // readOnly = true:
    // - Hibernate skips dirty checking (no need to track changes for read)
    // - Database driver may use read replicas
    // - Flush mode set to NEVER (no accidental writes)
    // Use for ALL read-only operations — meaningful performance gain
}

// ── ROLLBACK RULES ────────────────────────────────────────────────
@Transactional(rollbackOn = Exception.class)    // rollback on any Exception
@Transactional(noRollbackFor = BusinessException.class)  // don't rollback for this

// Default: Spring only rolls back on RuntimeException and Error
// Checked exceptions do NOT trigger rollback by default!

Propagation in Practice

REQUIRES_NEW — Independent Audit Log

@Service
public class PaymentService {

    private final AuditService auditService;

    @Transactional
    public PaymentResult processPayment(PaymentRequest request) {
        try {
            // This runs in Transaction A
            Payment payment = createPayment(request);
            chargeCard(payment);

            // Audit log in SEPARATE transaction B — commits even if A fails
            auditService.logPaymentAttempt(payment, "SUCCESS");

            return PaymentResult.success(payment);
        } catch (PaymentException e) {
            // Transaction A will roll back — payment NOT saved
            // But audit log (Transaction B) was already committed
            auditService.logPaymentAttempt(null, "FAILED: " + e.getMessage());
            throw e;
        }
    }
}

@Service
public class AuditService {
    @Transactional(propagation = Propagation.REQUIRES_NEW)
    public void logPaymentAttempt(Payment payment, String result) {
        // Always commits independently — critical for audit trails
        auditRepository.save(new AuditEvent(payment, result, Instant.now()));
    }
}

Isolation Levels & Locking

Isolation levels are the most underengineered aspect of database usage in Spring Boot applications. Using the wrong level causes data corruption. Using too strong a level destroys throughput. Every senior engineer must understand this.

Isolation Levels Explained

// READ_UNCOMMITTED — can read uncommitted changes from other transactions
// Problem: dirty reads. Almost never use this.
@Transactional(isolation = Isolation.READ_UNCOMMITTED)

// READ_COMMITTED — only reads committed data (default in PostgreSQL/Oracle)
// Prevents dirty reads. Still allows non-repeatable reads and phantom reads.
// Good default for most applications.
@Transactional(isolation = Isolation.READ_COMMITTED)  // use this for most reads

// REPEATABLE_READ — same row read twice gives same result (default MySQL InnoDB)
// Prevents dirty + non-repeatable reads. Still allows phantom reads.
// Higher overhead — use for financial calculations that read the same data twice.
@Transactional(isolation = Isolation.REPEATABLE_READ)

// SERIALIZABLE — full isolation, transactions appear sequential
// Prevents all anomalies. Severe performance impact — lock contention.
// Use only for critical financial operations (bank transfers, inventory allocation).
@Transactional(isolation = Isolation.SERIALIZABLE)

Read Phenomena — What Can Go Wrong

# DIRTY READ — reading uncommitted changes from another transaction
# Tx A writes balance=900 (not committed)
# Tx B reads balance=900
# Tx A rolls back — real balance is 1000
# Tx B acted on wrong data

# NON-REPEATABLE READ — same row, different value on second read
# Tx A reads order.status = PENDING
# Tx B updates order.status = SHIPPED (commits)
# Tx A reads order.status again = SHIPPED
# Decision based on first read is now wrong

# PHANTOM READ — same query, different rows on second run
# Tx A queries: SELECT COUNT(*) FROM seats WHERE available=true → 5
# Tx B inserts a new available seat (commits)
# Tx A queries again → 6 (phantom row appeared)
# Relevant for: inventory checks, capacity planning

# LOST UPDATE — two transactions overwrite each other
# Tx A reads account balance: 1000
# Tx B reads account balance: 1000
# Tx A writes: balance = 1000 - 100 = 900 (commits)
# Tx B writes: balance = 1000 - 200 = 800 (commits)
# Expected: 700. Actual: 800. Tx A's write was lost.

Optimistic Locking — @Version

// Optimistic locking: no DB locks. Check version on UPDATE.
// If version changed since we read it, throw exception.
// Best for: low-contention scenarios, long-running user workflows

@Entity
public class Product {
    @Id private Long id;

    @Version  // Hibernate manages this automatically
    private Long version;

    private int stockCount;
}

// Hibernate generates:
// UPDATE products SET stock_count=?, version=version+1
// WHERE id=? AND version=?  ← if version mismatch, 0 rows updated → exception

@Service
@Transactional
public class InventoryService {
    public void deductStock(Long productId, int quantity) {
        Product product = productRepository.findById(productId).orElseThrow();
        if (product.getStockCount() < quantity) {
            throw new InsufficientStockException();
        }
        product.setStockCount(product.getStockCount() - quantity);
        // If another thread updated stock between our read and write:
        // → OptimisticLockingFailureException thrown
        // → Caller should retry the operation
    }
}

Pessimistic Locking — SELECT FOR UPDATE

// Pessimistic locking: lock the row in the DB. Others wait.
// Best for: high-contention scenarios, guaranteed exclusivity needed

// PESSIMISTIC_WRITE: SELECT ... FOR UPDATE (blocks other reads AND writes)
@Lock(LockModeType.PESSIMISTIC_WRITE)
@Query("SELECT p FROM Product p WHERE p.id = :id")
Optional<Product> findByIdForUpdate(@Param("id") Long id);

// PESSIMISTIC_READ: SELECT ... FOR SHARE (blocks writes, allows other reads)
@Lock(LockModeType.PESSIMISTIC_READ)
Optional<Product> findById(Long id);

// Usage
@Transactional
public void allocateStock(Long productId, int qty) {
    // Row is locked for the entire transaction
    Product product = productRepository.findByIdForUpdate(productId).orElseThrow();
    if (product.getStockCount() < qty) throw new InsufficientStockException();
    product.setStockCount(product.getStockCount() - qty);
    // Lock released on commit
}

// Warning: pessimistic locks can cause DEADLOCKS if multiple transactions
// acquire locks in different order. Always lock in a consistent order.

JPQL, Criteria API & Native Queries

Spring Data gives you multiple ways to query the database. Each has a different sweet spot — understanding when to use which prevents both over-engineering and under-powered queries.

Derived Query Methods

// Spring Data generates SQL from method names — great for simple queries
public interface OrderRepository extends JpaRepository<Order, Long> {

    // SELECT * FROM orders WHERE status = ?
    List<Order> findByStatus(OrderStatus status);

    // SELECT * FROM orders WHERE customer_id = ? AND status = ?
    List<Order> findByCustomerIdAndStatus(Long customerId, OrderStatus status);

    // SELECT * FROM orders WHERE total > ?
    List<Order> findByTotalGreaterThan(BigDecimal amount);

    // SELECT * FROM orders WHERE created_at BETWEEN ? AND ?
    List<Order> findByCreatedAtBetween(LocalDateTime from, LocalDateTime to);

    // SELECT * FROM orders WHERE notes LIKE %?%
    List<Order> findByNotesContainingIgnoreCase(String keyword);

    // With sorting and pagination
    Page<Order> findByStatus(OrderStatus status, Pageable pageable);

    // Existence check (SELECT COUNT(*) > 0)
    boolean existsByCustomerIdAndStatus(Long customerId, OrderStatus status);

    // Count
    long countByStatus(OrderStatus status);

    // Delete
    @Modifying
    @Transactional
    void deleteByStatusAndCreatedAtBefore(OrderStatus status, LocalDateTime before);
}

@Query with JPQL

public interface OrderRepository extends JpaRepository<Order, Long> {

    // JOIN FETCH to avoid N+1
    @Query("SELECT o FROM Order o JOIN FETCH o.customer JOIN FETCH o.items WHERE o.id = :id")
    Optional<Order> findByIdWithDetails(@Param("id") Long id);

    // DTO projection — only loads needed fields
    @Query("""
        SELECT new com.myapp.dto.OrderSummaryDto(
            o.id, c.name, o.total, o.status, o.createdAt
        )
        FROM Order o JOIN o.customer c
        WHERE o.status = :status
        ORDER BY o.createdAt DESC
    """)
    List<OrderSummaryDto> findSummaryByStatus(@Param("status") OrderStatus status);

    // Bulk update — much faster than loading + updating entities one by one
    @Modifying
    @Transactional
    @Query("UPDATE Order o SET o.status = :newStatus WHERE o.status = :oldStatus AND o.createdAt < :before")
    int bulkUpdateStatus(@Param("oldStatus") OrderStatus old,
                         @Param("newStatus") OrderStatus newStatus,
                         @Param("before") LocalDateTime before);

    // Pagination with count query optimized separately
    @Query(value = "SELECT o FROM Order o JOIN FETCH o.customer WHERE o.status = :status",
           countQuery = "SELECT COUNT(o) FROM Order o WHERE o.status = :status")
    Page<Order> findByStatusWithCustomer(@Param("status") OrderStatus status, Pageable pageable);
}

Native SQL Queries

// Use native SQL for: database-specific features, complex analytics,
// window functions, CTEs, full-text search, JSONB queries

@Query(value = """
    WITH ranked_orders AS (
        SELECT o.*,
               ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
        FROM orders o
        WHERE status = :status
    )
    SELECT * FROM ranked_orders WHERE rn = 1
""", nativeQuery = true)
List<Object[]> findLatestOrderPerCustomer(@Param("status") String status);

// Map to interface projection (cleaner than Object[])
@Query(value = """
    SELECT o.id, o.total, c.name AS customer_name,
           COUNT(oi.id) AS item_count
    FROM orders o
    JOIN customers c ON o.customer_id = c.id
    JOIN order_items oi ON oi.order_id = o.id
    WHERE o.created_at >= :from
    GROUP BY o.id, o.total, c.name
    ORDER BY o.total DESC
""", nativeQuery = true)
List<OrderReportProjection> getOrderReport(@Param("from") LocalDateTime from);

// Interface projection — Spring creates a proxy
public interface OrderReportProjection {
    Long getId();
    BigDecimal getTotal();
    String getCustomerName();
    int getItemCount();
}

Interface & DTO Projections

// Interface projection — Spring generates proxy, only selected columns loaded
public interface ProductSummary {
    Long getId();
    String getName();
    BigDecimal getPrice();
    // Computed expression
    @Value("#{target.price * 1.2}")
    BigDecimal getPriceWithTax();
}

// Spring Data automatically projects to interface
List<ProductSummary> findByCategory(String category);
// SELECT id, name, price FROM products WHERE category = ?
// (NOT SELECT * — only the columns used in the projection)

// Record projection (Hibernate 6+ / Spring Boot 3+)
public record ProductSummaryDto(Long id, String name, BigDecimal price) {}

// Works with derived queries and @Query
List<ProductSummaryDto> findByCategory(String category);

Connection Pooling with HikariCP

Every database interaction requires a connection. Creating a new TCP connection per request would be catastrophic — it takes 20–100ms. Connection pools maintain a set of open connections and hand them out on demand. HikariCP is Spring Boot's default and the fastest Java connection pool available.

Why Pool Size Matters More Than You Think

A common misconception: "more connections = better performance." The opposite is often true. Database servers have limited resources. Too many connections cause context-switching overhead and memory pressure on the DB. The right pool size depends on your workload — too small causes timeouts, too large causes DB overload.

HikariCP Production Configuration

spring:
  datasource:
    hikari:
      # Pool size — formula: (core_count * 2) + effective_spindle_count
      # For I/O-bound workloads: 10-20 is often ideal
      maximum-pool-size: 20
      minimum-idle: 5            # keep 5 warm connections always

      # Timeout settings
      connection-timeout: 30000  # 30s max wait to get a connection from pool
      idle-timeout: 600000       # 10min — remove idle connections
      max-lifetime: 1800000      # 30min — recycle connections (prevent stale)
      keepalive-time: 60000      # ping idle connections every 60s

      # Validation
      connection-test-query: SELECT 1  # for MySQL (Postgres uses ping by default)

      # Pool name — appears in logs and JMX
      pool-name: OrderServicePool

      # Leak detection — logs stack trace if connection held > this duration
      leak-detection-threshold: 5000  # 5 seconds

Connection Pool Exhaustion

Connection Pool Exhaustion — A Silent Killer

When all pool connections are in use, new requests wait up to connection-timeout then fail with SQLTimeoutException: Unable to acquire JDBC Connection. This cascades: slow DB queries hold connections longer → more requests queue → server appears down. Root causes: slow queries, N+1 problems, transactions left open too long, or pool sized too small for traffic.

Monitoring Pool Health

// HikariCP exposes metrics via Micrometer — view in Actuator
// GET /actuator/metrics/hikaricp.connections.active
// GET /actuator/metrics/hikaricp.connections.pending
// GET /actuator/metrics/hikaricp.connections.idle

// In application.yml, expose via Prometheus:
// management.endpoints.web.exposure.include: metrics, prometheus

// Key metrics to alert on:
// hikaricp.connections.pending > 0  — connections are being waited for
// hikaricp.connections.active / hikaricp.connections.max > 0.8  — 80% utilization
// hikaricp.connections.timeout.total increasing — requests timing out

// Programmatic access
@Component
public class PoolHealthChecker {
    @Autowired HikariDataSource dataSource;

    public PoolStats getStats() {
        HikariPoolMXBean pool = dataSource.getHikariPoolMXBean();
        return new PoolStats(
            pool.getActiveConnections(),
            pool.getIdleConnections(),
            pool.getThreadsAwaitingConnection(),
            pool.getTotalConnections()
        );
    }
}

Schema Migrations — Flyway & Liquibase

Never use hibernate.ddl-auto=create or update in anything beyond local development. Production schemas must be managed with version-controlled migration scripts that are repeatable, auditable, and reversible. Flyway and Liquibase are the two industry standards.

🐦

Flyway

SQL-first migrations. Simple, explicit, easy to audit. Each migration is a .sql file with a version number. Hard to undo — design for forward-only changes.

💧

Liquibase

XML/YAML/JSON changesets with built-in rollback support. More complex but supports undo operations and multi-database abstraction. Preferred for large enterprise teams.

Flyway Configuration

spring:
  flyway:
    enabled: true
    locations: classpath:db/migration     # where migration files live
    baseline-on-migrate: true             # allow migrating existing DBs
    validate-on-migrate: true             # validate checksums on startup
    out-of-order: false                   # strict version ordering

  jpa:
    hibernate:
      ddl-auto: validate                  # Hibernate validates against migrated schema

Migration File Structure & Naming

src/main/resources/db/migration/
  V1__create_customers_table.sql
  V2__create_orders_table.sql
  V3__add_order_status_index.sql
  V4__add_customer_email_unique.sql
  V5__add_products_table.sql
  V6__add_shipping_address_to_orders.sql

Example Migration SQL

-- V2__create_orders_table.sql
CREATE TABLE orders (
    id          BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL REFERENCES customers(id),
    status      VARCHAR(20) NOT NULL DEFAULT 'PENDING',
    total       NUMERIC(12, 2) NOT NULL DEFAULT 0,
    notes       TEXT,
    version     BIGINT NOT NULL DEFAULT 0,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);

-- V3__add_order_items_table.sql
CREATE TABLE order_items (
    id         BIGSERIAL PRIMARY KEY,
    order_id   BIGINT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    product_id BIGINT NOT NULL REFERENCES products(id),
    quantity   INT NOT NULL CHECK (quantity > 0),
    unit_price NUMERIC(12, 2) NOT NULL
);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);

Never Edit a Migration That Has Already Run

Flyway checksums each migration file. If you modify a file that's already been applied to any environment, Flyway will refuse to start with a checksum mismatch error. Always create a new migration to fix a previous one. This is not a limitation — it's an audit trail.

Zero-Downtime Migration Rules

For live production deployments: (1) Add columns as nullable first — existing rows won't fail. (2) Never remove columns in the same release that removes code using them — do it in the next release. (3) Add indexes CONCURRENTLY in PostgreSQL to avoid table locks. (4) Rename columns in 3 steps: add new column, dual-write both, remove old column.

Query Optimization & Indexing

A slow query in production is a crisis. Understanding how to diagnose, index, and rewrite slow queries is one of the highest-value skills a backend engineer can have.

EXPLAIN ANALYZE — Reading Query Plans

-- Always use EXPLAIN ANALYZE in PostgreSQL to diagnose slow queries
EXPLAIN ANALYZE
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'PENDING'
ORDER BY o.created_at DESC
LIMIT 20;

-- Look for:
-- "Seq Scan" on large tables → missing index
-- "Rows Removed by Filter: 50000" → poor selectivity
-- high "actual time" → bottleneck found
-- "Nested Loop" with large outer rows → join optimization needed

-- Bad output (no index):
-- Seq Scan on orders  (cost=0.00..45231.00 rows=123 width=200)
--                     (actual time=0.1..2340.5 rows=123 loops=1)
--   Filter: (status = 'PENDING')
--   Rows Removed by Filter: 500000

-- Good output (with index):
-- Index Scan using idx_orders_status_created on orders
--                     (actual time=0.05..1.2 rows=123 loops=1)
--   Index Cond: (status = 'PENDING')

Indexing Strategy

When and How to Index

-- ALWAYS INDEX: foreign keys (Postgres doesn't auto-index them like MySQL)
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- Composite index — column ORDER matters. Left-prefix rule.
-- This index supports: WHERE status=? AND created_at>?
--                      WHERE status=?
--                  NOT: WHERE created_at>?  (status must be first)
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);

-- Partial index — smaller, faster for filtered queries
CREATE INDEX idx_orders_pending ON orders(created_at DESC)
WHERE status = 'PENDING';
-- Only indexes pending orders — much smaller than a full index

-- Covering index — includes extra columns to avoid table lookups
CREATE INDEX idx_orders_customer_covering
ON orders(customer_id) INCLUDE (status, total, created_at);
-- Query can be answered from index alone — no heap access

-- CONCURRENTLY — add index without locking the table (Postgres)
CREATE INDEX CONCURRENTLY idx_orders_new ON orders(email);
-- Takes longer but production traffic is not blocked

Over-Indexing Is Also Harmful

Every index slows down INSERT/UPDATE/DELETE operations because the index must be maintained. A table with 15 indexes takes 3–5× longer to insert into than one with 2. Index only columns that appear in WHERE clauses of frequent queries, join conditions, and ORDER BY on large result sets. Drop unused indexes — PostgreSQL's pg_stat_user_indexes shows index usage statistics.

When NOT to Use JPA

JPA is powerful but not the right tool for every job. Knowing when to bypass it is a sign of engineering maturity. Forcing everything through Hibernate can be 10–100× slower than well-written SQL for certain workloads.

SKIP JPA Bulk inserts / updates of thousands of rows

JPA loads entities into memory, runs dirty checking, generates individual statements. Use Spring's JdbcTemplate.batchUpdate() or a COPY command for bulk operations — 10–100× faster.

SKIP JPA Complex reporting / aggregation queries

Window functions, CTEs, GROUP BY with HAVING, ROLLUP — these are natural SQL but awkward or impossible in JPQL. Use @Query(nativeQuery=true) or JdbcTemplate for analytics queries.

SKIP JPA Single-column lookups where you don't need an object

Loading a full entity just to check one column is wasteful. Use a scalar projection: @Query("SELECT o.status FROM Order o WHERE o.id = :id") or an interface projection.

SKIP JPA Database-specific features (JSONB, full-text, arrays)

PostgreSQL's JSONB operators, tsvector full-text search, and array types don't map cleanly to JPA. Use native queries or Spring Data's custom query support.

JdbcTemplate for Bulk Operations

@Service
@RequiredArgsConstructor
public class BulkImportService {

    private final JdbcTemplate jdbcTemplate;

    // Insert 100,000 rows efficiently
    public void bulkInsertProducts(List<ProductImportDto> products) {
        String sql = "INSERT INTO products (name, sku, price, category) VALUES (?,?,?,?)";

        jdbcTemplate.batchUpdate(sql, products, 1000, (ps, product) -> {
            ps.setString(1, product.name());
            ps.setString(2, product.sku());
            ps.setBigDecimal(3, product.price());
            ps.setString(4, product.category());
        });
        // Sends 100 batches of 1000 rows each — MUCH faster than 100,000 individual inserts
    }

    // Complex analytics query — not possible in JPQL
    public List<RevenueByCategory> getMonthlyRevenue(int year) {
        return jdbcTemplate.query("""
            SELECT c.name AS category,
                   EXTRACT(MONTH FROM o.created_at) AS month,
                   SUM(oi.quantity * oi.unit_price) AS revenue
            FROM orders o
            JOIN order_items oi ON oi.order_id = o.id
            JOIN products p ON oi.product_id = p.id
            JOIN categories c ON p.category_id = c.id
            WHERE EXTRACT(YEAR FROM o.created_at) = ?
              AND o.status = 'COMPLETED'
            GROUP BY c.name, month
            ORDER BY month, revenue DESC
        """, (rs, row) -> new RevenueByCategory(
            rs.getString("category"),
            rs.getInt("month"),
            rs.getBigDecimal("revenue")
        ), year);
    }
}

Quick Check

You call a @Transactional service method that loads an entity, modifies a field, but never calls save(). What happens?

Nothing — the change is lost, Hibernate doesn't track mutations without save()

The change IS saved — Hibernate's dirty checking detects the mutation and generates an UPDATE on flush/commit

A TransactionRequiredException is thrown

The change is saved only if you call entityManager.flush() explicitly

Correct. Any entity in the MANAGED state (loaded within a transaction) is tracked by Hibernate's persistence context. At flush time (before commit), Hibernate compares the current state to the snapshot taken at load time. Any difference generates an UPDATE — no save() call needed. This is automatic dirty checking and is one of JPA's most important features.

Interview Questions

Q: What is the difference between first-level and second-level cache in Hibernate?

First-level cache is the persistence context — scoped to a single transaction/session. Every loaded entity is stored here; loading the same entity twice in one transaction returns the cached instance. Second-level cache is application-scoped — shared across sessions. It caches entities between transactions. Must be explicitly configured (EhCache, Redis) and requires careful invalidation strategy. Most applications only need the first-level cache.

Q: Why is equals() and hashCode() important in JPA entities?

JPA entities in collections (Sets, HashMaps) need stable equals/hashCode. A new entity before persisting has id=null, after persisting has id=123. If hashCode is based on id, the entity changes buckets in a HashSet after persist — it becomes "lost". Best practice: base equals/hashCode on a business key (email, SKU) that's stable, or use a fixed hashCode (return getClass().hashCode()) with id-based equals — allowing null-safe comparison.

Q: What is the Open Session in View pattern and why is it problematic?

OSIV keeps the Hibernate session open for the entire HTTP request lifecycle, including view rendering. This allows lazy loading in the controller/view layer but fires SQL queries during response serialization — where you have no control or visibility. It hides N+1 problems that only appear at scale. The fix: disable it with spring.jpa.open-in-view=false, map entities to DTOs inside the @Transactional service, and use explicit fetch joins for needed associations.

Q: What happens when you throw a checked exception from a @Transactional method?

By default, Spring only rolls back transactions for unchecked exceptions (RuntimeException and Error). A checked exception will NOT trigger a rollback — the transaction commits successfully even though an exception was thrown. This is a common production bug. Fix: use @Transactional(rollbackOn = Exception.class) or throw a RuntimeException wrapper.

Q: Explain the difference between optimistic and pessimistic locking. When would you use each?

Optimistic locking (using @Version) assumes conflicts are rare — it doesn't lock rows, instead checking the version number on UPDATE. If the version changed since we read it, it throws OptimisticLockingFailureException and the caller retries. Best for: user-facing workflows with low conflict probability. Pessimistic locking (SELECT FOR UPDATE) locks the row in the database immediately, making other transactions wait. Best for: inventory reservation, financial operations where you cannot allow concurrent modification. Pessimistic locking risks deadlocks if transactions acquire locks in different orders.

Q: How would you diagnose and fix a slow API endpoint that queries the database?

Step 1: Enable SQL logging and count queries for the endpoint. Step 2: Run EXPLAIN ANALYZE on slow queries in PostgreSQL. Step 3: Check for Seq Scans on large tables — add appropriate indexes. Step 4: Check for N+1 — look for repeated identical queries with different IDs. Step 5: Check for unnecessary data loading — use projections instead of full entity loads. Step 6: Check connection pool metrics — if connections are exhausted, queries queue. Step 7: Consider caching for read-heavy, rarely-changing data. Always measure before and after with realistic data volumes.

Q: Why should you never use ddl-auto=update in production?

Hibernate's ddl-auto=update only adds columns and tables — it never removes them or changes column types safely. It can't handle data migrations. It executes DDL in the application startup transaction, which blocks startup. It has no version history or audit trail. Most critically, it's unpredictable on concurrent deployments with multiple instances. Use Flyway or Liquibase: explicit, versioned, auditable migrations that run exactly once and fail loudly if something goes wrong.

Database Engineering
with Spring Boot

JPA & Hibernate Internals

JPA (Specification)

Hibernate (ORM)

Spring Data JPA

What Hibernate Does For You — And What It Hides

The Persistence Context Explained

Entity Mapping — Done Right

ID Generation Strategies

Entity Lifecycle & Persistence Context

TRANSIENT

MANAGED

DETACHED

REMOVED

Relationships — OneToMany, ManyToOne, ManyToMany

Lazy vs Eager Loading

LAZY (default for collections)

EAGER (default for @ManyToOne)

The LazyInitializationException

The N+1 Problem — Detection & Fix

Detecting N+1 in Development

Transactions & @Transactional

How @Transactional Works Internally

Propagation in Practice

Isolation Levels & Locking

JPQL, Criteria API & Native Queries

Connection Pooling with HikariCP

Connection Pool Exhaustion

Schema Migrations — Flyway & Liquibase

Flyway

Liquibase

Query Optimization & Indexing

Indexing Strategy

When NOT to Use JPA

Interview Questions

Section 05 Complete

Database Engineeringwith Spring Boot

JPA & Hibernate Internals

JPA (Specification)

Hibernate (ORM)

Spring Data JPA

What Hibernate Does For You — And What It Hides

The Persistence Context Explained

Entity Mapping — Done Right

ID Generation Strategies

Entity Lifecycle & Persistence Context

TRANSIENT

MANAGED

DETACHED

REMOVED

Relationships — OneToMany, ManyToOne, ManyToMany

Lazy vs Eager Loading

LAZY (default for collections)

EAGER (default for @ManyToOne)

The LazyInitializationException

The N+1 Problem — Detection & Fix

Detecting N+1 in Development

Transactions & @Transactional

How @Transactional Works Internally

Propagation in Practice

Isolation Levels & Locking

JPQL, Criteria API & Native Queries

Connection Pooling with HikariCP

Connection Pool Exhaustion

Schema Migrations — Flyway & Liquibase

Flyway

Liquibase

Query Optimization & Indexing

Indexing Strategy

When NOT to Use JPA

Interview Questions

Section 05 Complete

Database Engineering
with Spring Boot