

#### Outline

- What is jor1k? Overview
- The CPU implementation exemplified on the RISC-V architecture (GSoC 2015)
- How to not idle in JavaScript
- Filesystem and Network
- Outlook

## What is jor1k?

#### The website: jor1k.com



asm.js Core

Bugtracker to report any issues or feature requests

#### Developer and contributors

Main developer - Sebastian Macke simulationcorner.net Gratipay Network support - Ben Burns benjamincburns.com Gratipay

www.github.com/s-macke/jor1k





#### Jor1k Modules



# RISC-V GSOC 2015 project

**Student: Prannoy Pilligundla** 

## RISC-V observations from someone who just wanted to write an emulator

- Well designed ISA
  - 32 registers , 4 byte instructions, Easy encoding
  - Modules (32-Bit, 64-Bit, 128-Bit, Integer, Floating Point, Atomic, .....)
  - No history
  - Little endian
  - Only 2 instructions for random 32-Bit jumps
  - No delayed instruction
  - No conditional instructions
  - No Flags
  - Complete implementation of mul and div
  - PC relative addressing modes ("auipc")
  - A lot of atomic instructions
  - Is even more like bytecode in comparison to OpenRISC (less side effects)
- Linux does run with the help of a proxy kernel and does not have direct access to hardware

## RISC-V observations from someone who just wanted to write an emulator

- Complete/Good hardware devices support is still missing.
  - No MMIO, instead two register based HTIF interface
  - Hardware devices in 32-Bit are currently not supported

- Still lots of errors in the toolchain
  - 32-Bit is not well tested
- Documentation is still incomplete inconsistencies with emulator
- Confusing mode switching (traps, ecall, eret, ebreak, mrts)

JavaScript Modules



## Why JavaScript?

JavaScript is the language of the Web

- It runs everywhere

- It is available immediately

#### JavaScript

- Doesn't support 64-Bit integers
- JavaScript is very creative with typing

```
- [1,2,3]+[4,5,6] => 1,2,34,5,6
- {} + {} => NaN
- 0 == "" => true
- x=Math.sqrt(-2); if (x==x) => false
- if (new Array() == false) => true
```

- JavaScript is considered "slow"
- At least four companies are writing optimized compilers to squeeze out the maximum performance.

#### All numbers are double

There are no integers, only doubles

But there are logical operations

- But there are typed arrays
  - -x = new Uint32Array(length)

#### How to prevent deoptimizations?

 What's wrong with the following code concerning the compiled code?

```
for(;;) {
   Advance Timer();
   Check Interr
                             What happens internal?
   ppc = Transl
           ram.Re
                             1. Add 4 to pc (integer)
                              2. Check for overflow
                       3. Deoptimize into double if overflow
   switch (ins&
                      4. Cascade of deoptimizations where pc is
                                       used.
```

#### How to prevent deoptimizations?

Prevent overflow by adding a "|0"

```
- for(;;) {
    Advance Timer();
    Check Interrupt();
    ppc = Translate Virtual To Physical (pc);
    ins = ram.Read32(
                             What happens internal?
    pc = (pc + 4) | 0;
                             1. Add 4 to pc (integer)
    // decode instruct
                               2. Ignore noop "|0"
    switch (ins&0x7F)
```

#### How to prevent deoptimizations?

Add more typing helpers

```
- for(;;) {
   Advance Timer();
    Check Interrupt();
    ppc = Translate Virtual To Physical(pc|0)|0;
    ins = ram.Read32(ppc|0)|0;
    pc = (pc + 4) | 0;
    // decode instruction
    switch (ins&0x7F) {
```

#### What is asm.js

- The mode "use strict"; adds additional error messages for accessing undefined variables.
- The mode "use asm"; adds additional error messages to give you a guarantee for fixed type variables that must be compiled only once.
  - Only a subset of Javascript is allowed
  - Fully compatible
- Implemented in Firefox in 2013
- Implemented in Edge in 2015

#### What is asm.js

But the syntax is nasty

```
-group0[SPR_IMMUCFGR] = 0x18;
-h[group0p + (SPR_IMMUCFGR<<2) >> 2]
= 0x18;
```

- h is the heap and group 0p is the pointer to the table
- In this case the "view" of the heap is 32 Bit.
   Therefore the last operation for the index must be ">> 2"
- Project Emscripten allows to translate C++ to asm.js JavaScript

## Benchmark from last year



## Benchmark one week ago



### Benchmark one week ago

- Firefox on Core2Duo: 120 MIPS
- Firefox on Core i7 2600: 75 MIPS
- Firefox on Celeron G1820: 180 MIPS
- Firefox on Core i7 4770: 246 MIPS
- Chrome: on Core i7 2600: 60 MIPS
- IE 11 on Core i7 2600: 68 MIPS
- Safari on Apple A7: 81 MIPS

Approx. speed of the CPUs of the year 1997

OpenRISC CPU implementation: 1609 lines of code RISC-V CPU implementation: 2181 lines of code

dd gzip bzip2

#### Architecture

OpenRISC is easy

```
switch ((ins >> 26)&0x3F) {
...
    case 0x29: // 1.andi
    r[(ins>>21)&0x1F] = r[(ins>>16)&0x1F] & (ins&0xFFFF);
    continue;
...
}
```

RISC-V is easy too, but not that easy

```
switch (ins&0x7F) {
    ...
    case 0x13:
        switch((ins >> 12)&0x7) {
            ...
            case 0x7: // andi
            r[(ins>>7)&0x1F] = r[(ins>>15)&0x1F] & (ins>>20);
            continue;
```

To decode an instruction in RISC-V takes 2 switches in average and not 1 like for OpenRISC

#### Instruction emulation for ARM

```
void
armv5 and(){
               uint32 t icode = ICODE;
               int rn,rd;
               uint32 t cpsr=REG CPSR;
               uint32 t Rn,op2,result;
               uint32 t S;
               if(!check condition(icode)) {
                              return;
               rd=(icode>>12) &0xf;
               rn=(icode>>16) &0xf;
               Rn=ARM9 ReadReg(rn);
               cpsr&= ~(FLAG N | FLAG Z | FLAG C);
               cpsr |= get data processing operand(icode);
               op2 = AM SCRATCH1;
               result=Rn&op2;
               ARM9 WriteReg(result,rd);
               S=testbit(20,icode);
               if(S) {
                               if(!result) {
                                              cpsr|=FLAG Z;
                               if(ISNEG(result)) {
                                              cpsr|= FLAG N;
                               if(rd==15) {
                                              if (MODE HAS SPSR) {
                                                              SET REG CPSR (REG SPSR);
                                              } else {
                                                              fprintf(stderr, "Mode has no spsr in line %d\n", LINE );
                               } else {
                                              REG CPSR=cpsr;
               dbgprintf("AND result op1 %08x,op2 %08x, result %08x\n",Rn,op2,result);
```

#### Fence technique

- For every instruction the program counter (pc) must be translated to the physical pc. However, the pc advances usually by 4 not leaving the current page.
- The fastpath for one instruction looks like this:

```
for(;;) {
   if ((physical_pc|0) != (fence|0)) {
      ins = int32ram[physical_pc >> 2]|0;
      physical_pc = physical_pc + 4|0;

      switch (ins&0x7F) {
      ....
      }
   } else {
      Advance_Timer();
      Check_Interrupt();
      .....
   }
}
```

The idea here is that the virtual pc is computed only when needed by translating ppc (physical pc) back to the virtual pc address. The variable fence is used to break out of the fast path when ppc reaches a jump or the end of the current page.

# Assembly level optimizations

| Javascript code             | Block description                       | Generated x86 asm code                                                                                        |
|-----------------------------|-----------------------------------------|---------------------------------------------------------------------------------------------------------------|
| for(;;) {                   | .set .Llabel132981, .                   |                                                                                                               |
| if ((fence 0) != (ppc 0)) { |                                         | Movl (nil), %ecx<br>Movl (nil), %eax<br>Cmpl %eax, %ecx<br>je .Lfrom133000                                    |
| ins = ram[ppc >> 2] 0;      | MoveGroup<br>BitOpl:bitand              | movl %eax, %ecx andl \$0xffffffffffffc, %ecx                                                                  |
|                             | AsmJSLoadHeap                           | <pre>cmpl \$0xfffffffc, %ecx ja .Lfrom133022 movl 0x0000(%ecx), %ecx</pre>                                    |
|                             | MoveGroup                               | movl %ecx, 0x2c(%esp)                                                                                         |
| ppc = ppc + 4   0;          | instruction Addl<br>AsmJSStoreGlobalVar | addl \$4, %eax movl %eax, (nil)                                                                               |
| switch(ins&0x7F) {          | MoveGroup instruction BitOpI:bitand     | movl 0x2c(%esp), %edx andl \$0x7f, %edx                                                                       |
|                             | TableSwitch                             | <pre>subl \$3, %edx cmpl \$0x71, %edx jae .Lfrom133081 movl \$0xfffffffff, %ecx jmp *0x0(%ecx, %eax, 4)</pre> |

| Javascript code             | Block description                       | Generated x86 asm code                                                                             |
|-----------------------------|-----------------------------------------|----------------------------------------------------------------------------------------------------|
| for(;;) {                   | .set .Llabel132981, .                   |                                                                                                    |
| if ((fence 0) != (ppc 0)) { |                                         | Movl (nil), %ecx<br>Movl (nil), %eax<br>Cmpl %eax, %ecx<br>je .Lfrom133000                         |
| ins = r2m[ppc >> 2] 0;      | MoveGroup<br>BitOpI:bitand              | movl %eax, %ecx andl \$0xfffffffc, %ecx                                                            |
|                             | AsmJSLoadHeap                           | cmpl \$0xffffffffc, %ecx  Jinnecessary load movie x 0xfffffffc, %ecx                               |
|                             | MoveGroup                               | movl %ecx, 0x2c(%esp)                                                                              |
| ppc = ppc + 4 0;            | instruction Addl<br>AsmJSStoreGlobalVar | addl \$4, %eax movl %eax (nil)                                                                     |
| switch(ins&0).7F) {         | MoveGroup instruction BitOpl:bitand     | movl 0x2c(%esp), %edx andl \$0x/1, %eax                                                            |
|                             | TableSwitch                             | Ineyswork on it cmpl \$0x71, %edx jae .Lfrom133081 movl \$0xffffffff, %ecx jmp *0x0(%ecx, %eax, 4) |

| Javascript code             | Block description                           | Generated x86 asm code                                                                                  |
|-----------------------------|---------------------------------------------|---------------------------------------------------------------------------------------------------------|
| for(;;) {                   | .set .Llabel132981, .                       |                                                                                                         |
| if ((fence 0) != (ppc 0)) { |                                             | Movl (nil), %ecx Movl (nil), %eax Cmpl %eax, %ecx je .Lfrom133000                                       |
| ins = ram[ppc >> 2] 0;      | MoveGroup<br>BitOpI:bitand                  | movl %eax, %ecx andl \$0xffffffffffffc, %ecx                                                            |
|                             | AsmJSLoadHeap                               | <pre>cmpl \$0xfffffffc, %ecx ja .Lfrom133022 movl 0x0000(%ecx), %ecx</pre>                              |
|                             | MoveGroup                                   | movl %ecx, 0x2c(%esp)                                                                                   |
| ppc = ppc + 4 0;            | instruction Addl 1 S<br>AsmJSStoreGlobalVar | worl %eax, (nil)                                                                                        |
| switch(ins&0x7F) {          | MoveGroup instruction BitOpl:bitand         | tesinga taple edx                                                                                       |
|                             | TableSwitch Ad Cas                          | subl \$3, %edx  daduffing \$0x71, %edx  daduffing \$081  movl \$0xffffffff, %ecx  enp0*0x0(%ecx,%eax,4) |

| Javascript code             | Block description                   | Generated x86 asm code                                                                             |
|-----------------------------|-------------------------------------|----------------------------------------------------------------------------------------------------|
| for(;;) {                   | .set .Llabel132981, .               |                                                                                                    |
| if ((fence 0) != (ppc 0)) { |                                     | Movl (nil), %ecx<br>Movl (nil), %eax<br>Cmpl %eax, %ecx<br>je .Lfrom133000                         |
| ins = ram[ppc >> 2] 0;      | MoveGroup<br>BitOpI:bitand          | movl %eax, %ecx andl \$0xffffffffffffc, %ecx                                                       |
|                             | AsmJSLoadHeap                       | <pre>cmpl \$0xfffffffc, %ecx ja .Lfrom133022 movl 0x0000(%ecx), %ecx</pre>                         |
|                             | MoveGroup                           | movl %ecx, 0x2c(%esp)                                                                              |
| ppc = ppc + 4 0;            | instruction Addl Unit               | necessary check                                                                                    |
| switch(ins&0x7F) {          | MoveGroup instruction BitOpI:Lutand | mov1 0x2c(%csp), %edx andl \$0x7f, %edx                                                            |
|                             | TableSwitch                         | subl S3. %edx cmpl \$0x71, %edx jae .Lfrom133081  movb(sex) implifffff, %ecx mplougzilla, %eax, 4) |

# How to not idle in JavaScript?

## How to not idle in Javascript but stay responsive?

 Javascript fini efore continuing. Run 20 ms then idle for 0 Warning: Unresponsive script Run 20 ms A script on this page may be ng. You can stop the script now, time then idle for 0 open the script in the debug ms Run 20 ms Script: http:// then idle for 0 Don't ask me again ms Co

Solution: setTimout(function(){...}, 0);

#### What about the worker thread?

 setTimout(function(){...}, 0); doesn't work in worker thread. Message queue is never processed.

• But setTimout(function(){...}, 4); works (4ms

waiting time)



#### What about the worker thread?

Can we get the number of messages in the queue?

## NOPE!



#### Hack

 Play message ping pong, so that at least one message is always in the queue.



#### This was harmless!

## Next year: The horror of timing in JavaScript

## How to implement a (streaming) audio device into JavaScript?

Unreliable speed of Javascript
Only millisecond time resolution
Interrupts exists only, when you are idle
Message queue between worker and master

# The Filesystem

How to implement an efficient filesystem with a size of 200MB and 5000 files that runs over the internet?

# The Filesystem

- How long does a "du /" take over the internet?
  - NFS
  - Samba
  - Sshfs
  - On demand block device



### Problem is mainly latency, not throughput

Advantages of our filesystem:

- Read only filesystem on server

# The Filesystem

- Implement filesystem outside of the emulator to have full control
  - tmpfs like. Use virtio/9p to exchange commands with Linux
- Load the filesystem layout and metadata during the Linux boot process.

- OpenRISC binaries compress really well
- bz2 currently, in future .xz
- Ordinary web server needed
- Future: dependencies between files, packages
  - http 2.0 will help here

# Boot process timeline



#### time

Load of filesystem from server

Parallel load of files from filesystem

Non-parallelized decompression of file

## Additional features of the filesystem

- Atomic file operations
- Full control from the outside
- Watching Files
- Upload files into home folder
- Download home folder (as .tar)
- My own cloud: Sync with server
  - Unique user id (http://s-macke.github.io/jor1k/?user=cdqKKPxjfa)
  - Currently 1MB quota
  - server only needs upload.php

# Network



 Yo dawg, I heard you like browsing the web, so I put a browser in your browser so you can browse while you browse! (Twitter user Scott Elcomb)

### Network

- Server in the USA
  - connected via websockets
  - Sending and receiving ethernet frames connected to a Linux TAP device
- Full working intranet
  - Start jor1k in two windows and open a ssh session between them.
- Major network applications available
  - wget, curl, nc, ping, traceroute, telnet, ssh, nmap
  - Openssl with certificates
  - Web browsers: lynx, links, dillo

### Is the emulator useful?

- Well, sort of ...
  - Technology demonstration, Advertisement
  - Online interactive tutorials
  - Easy way to port and present terminal software
  - Teaching programming languages
  - Rogue like Network access
  - Fast testing environment for binaries
  - JavaScript benchmark
  - You can play games like Doom, Monkey Island,
     Elite II and Toppler

# Outlook

# "Future" slide from last year

- Sound √
- SMP **√**
- Run Debian (just one bug left) X
- Run Firefox X (compiles but crashes and too big)
- Status, statistics and debug screen X
- Download already booted Linux (state file) X
- More terminals, better user interface, directX access to the filesystem tree
- Dynamic recompiler with the eval function? 
   (slow)

### Future 2.0

- Full virtio driver support (done)
- Virtio-GPU and full-screen X-Window system
- Implementation of virtio for RiSC-V
- More terminals, better user interface
- Download already booted Linux (state file)
- Status, statistics and debug screen
- 64-Bit RISC-V
- Run Firefox

## Modules in the near future



### **Thanks**

- Stefan Kristiansson for the toolchain and infinite help in the chat.
- Ben Burns for implementing the network and providing the relay server
- Prannoy Pilligundla for implementing RISC-V
- Lawrence Angrave and Neelabh Gupta for the C-development website
- Jonas Bonn for the Linux kernel support
- Christian Svensson for the OpenRISC Debian distribution



### RISC-V Simple Demo

| bbl | , |
|-----|---|
|-----|---|

Restart with new binary image: ①

#### 6.4 MIPS

```
0.120000] rfbdevice found? 4 0 0182de00
    0.140000] Switched to clocksource riscv clocksource
    0.180000] NET: Registered protocol family 2
    0.220000] TCP established hash table entries: 1024 (order: 0, 4096 bytes)
    0.220000] TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
    0.220000] TCP: Hash tables configured (established 1024 bind 1024)
    0.220000] UDP hash table entries: 256 (order: 0, 4096 bytes)
    0.220000] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
    0.220000] NET: Registered protocol family 1
    0.240000] futex hash table entries: 256 (order: -1, 3072 bytes)
    0.420000] 9p: Installing v9fs 9p2000 file system support
    0.420000] io scheduler noop registered
    0.420000] io scheduler cfg registered (default)
    1.6000001 htifcon htif1: detected console
    1.960000] console [htifcon0] enabled
    2.000000] htifblk htif2: detected disk
    2.160000] htifblk htif2: added htifblk0
    2.180000] htifrfb htif3: detected framebuffer
    2.220000] 9pnet: Installing 9P2000 support
    2.280000] VFS: Mounted root (ext2 filesystem) readonly on device 254:0.
    2.280000] devtmpfs: mounted
    2.280000] Freeing unused kernel memory: 64K (c0000000 - c0010000)
Susybox started
```

# Idea for a good name?

- FEW the Fastest Emulator of the Web.
- RISE RISCV Instruction Set Emulator
- ORE Online RISCV Emulator
- RETRO RISCV Emulator That ROcks
- RETRO RISCV Emulator That Runs Online
- ERIN Emulation of RISC-V in a nutshell
- OVER Online Version of RISC-V
- OVER Online Versatile Emulator of RISC-V
- FEAR Fastest emulator around the R????

# Thanks for your attention!