InterviewDB Question

AI Code Craft - Design a Code Generation and Review Pipeline

Question Details

Round 1 Coding / System Design

Problem

Design an internal tool that takes a natural language task description and outputs working code with an automated review step. The pipeline has three stages: (1) code generation, (2) static analysis, (3) test execution.

Part A - Code: Implement the pipeline orchestrator.

python
class CodeCraftPipeline:
    def run(
        self,
        task: str,
        language: str = "python"
    ) -> dict:

**Returns**:
        # {
        #   "code": str,
        #   "lint_issues": list[str],
        #   "test_results": {"passed": int, "failed": int},
        #   "status": "approved" | "needs_review"
        # }
        ...

    def generate_code(self, task: str, language: str) -> str: ...
    def lint(self, code: str) -> list[str]: ...
    def run_tests(self, code: str) -> dict: ...

**Example**:
pipeline = CodeCraftPipeline()
result = pipeline.run("write a function that reverses a linked list")
# result["code"]         -> "def reverse_list(head): ..."
# result["lint_issues"]  -> []
# result["test_results"] -> {"passed": 3, "failed": 0}
# result["status"]       -> "approved"

Follow-ups

  1. How would you sandbox test execution to prevent malicious code from running?
  2. What feedback loop would you add so lint and test failures are fed back into code regeneration?
  3. How would you version and store generated code artifacts?
  4. How do you evaluate the quality of generated code beyond passing tests?

Full Details

Round 1 Coding / System Design

Problem

Design an internal tool that takes a natural language task description and outputs working code with an automated review step. The pipeline has three stages: (1) code generation, (2) static analysis, (3) test execution.

Part A - Code: Implement the pipeline orchestrator.

python
class CodeCraftPipeline:
    def run(
        self,
        task: str,
        language: str = "python"
    ) -> dict:

**Returns**:
        # {
        #   "code": str,
        #   "lint_issues": list[str],
        #   "test_results": {"passed": int, "failed": int},
        #   "status": "approved" | "needs_review"
        # }
        ...

    def generate_code(self, task: str, language: str) -> str: ...
    def lint(self, code: str) -> list[str]: ...
    def run_tests(self, code: str) -> dict: ...

**Example**:
pipeline = CodeCraftPipeline()
result = pipeline.run("write a function that reverses a linked list")
# result["code"]         -> "def reverse_list(head): ..."
# result["lint_issues"]  -> []
# result["test_results"] -> {"passed": 3, "failed": 0}
# result["status"]       -> "approved"

Follow-ups

  1. How would you sandbox test execution to prevent malicious code from running?
  2. What feedback loop would you add so lint and test failures are fed back into code regeneration?
  3. How would you version and store generated code artifacts?
  4. How do you evaluate the quality of generated code beyond passing tests?
Free preview. Unlock all questions →

Topics

Coding Onsite