2014-09-21

面试总结

程序员应该掌握的技术常识问题[持续更新]

1. SAN与NAS的不同，有哪些存储开源社区

san是存储区域网络，使用FC将存储设备连接起来形成的一个网络架构，以基于块的方式存储数据。

nas是网络附属存储，是一种网络存储服务器，以基于文件的方式存储数据。

存储方面的社区有ceph, freenas, openstack, suse, redhat，基本上一个开源产品对应一个开源社区。

2. LINUX内核开发需要注意的问题

内核开发需要注意：使用GNU C而非标准C语法（前者允许零长度数组，允许特殊属性声明__attribute__((aligned(4)));），慎用内核栈，没有用户态c标准库，注意同步和并发中断，不要轻易使用浮点数。

3. 内核中使用链表结构与用户态链表有什么不同

内核链表结构不含数据域，使用时作为成员嵌入到对应结构中；用户态链表实现通常直接使用结构指针，STL实现也是将结构包含在链表节点结构中。

4. 内核中红黑树的使用，特别是文件系统中的使用；基树又是怎么使用的

内核在进程地址空间的线性区管理(vm_area_struct)中使用了红黑树，新版的进程调度也使用了红黑树。

在linux 2.4时，可执行状态的进程被挂在一个链表中。每次调度，调度程序需要扫描整个链表，以找出最优的那个进程来运行。复杂度为O(n)；在linux 2.6早期，可执行状态的进程被挂在N(N=140)个链表中，每一个链表代表一个优先级，系统中支持多少个优先级就有多少个链表。每次调度，调度程序只需要从第一个不为空的链表中取出位于链表头的进程即可。这样就大大提高了调度程序的效率，复杂度为O(1)；在linux 2.6近期的版本中，可执行状态的进程按照优先级顺序被挂在一个红黑树（可以想象成平衡二叉树）中。每次调度，调度程序需要从树中找出优先级最高的进程。复杂度为O(logN)。

在18内核中文件系统方面仅有ocfs2，jffs2及ext3在使用红黑树，ext3在创建reservation window时使用了红黑树结构进行管理，主要是考虑文件增长会导致无法线性分配，所以先预留了一些空间(8块)，具体原因可以参看此文及此文，这颗红黑树的建立是为方便搜索其他预留窗口。

在32内核中连ext2也开始使用分配预留机制所以也使用红黑树，ext3及ext4在启用目录索引(磁盘上是B+树组织)时会在file的private_data中存放了dir_private_info结构，用来将一个打开目录文件的全部目录项组织成红黑树，其 root 字段指向 rbtree 的根 , rbtree 的节点由各 struct fname 结构（包含磁盘目录项的信息）的 rb_hash 字段形成，这颗红黑树的建立是为方便大目录的readdir操作多次进行。

nfs使用红黑树管理Access mode cache即acl元数据信息缓存，btrfs大量使用红黑树。

基树仅在Page cache中使用，将文件逻辑层的数据以页的形式映射到页高速缓存中，这些文件可能来自文件系统（文件数据或目录）也可能是块设备抽象。

5. B树和HASH方法的时间复杂度以及在系统中的表现，文件上散列又有何问题

B树访问是O(lgn)（树高度）复杂度，hash是O(1)复杂度。但是B树可扩展性更好，对于B+树来说在磁盘上对范围访问有很好的支持，相对hash表方法不易扩展且磁盘上范围访问性能不佳。顺序写文件并使用hash在顺序文件上做索引的方法，仍然受hash表大小限制不易扩展，且当索引表无法全部加载到内存时，hash随机访问特性无法在硬盘上发挥。

6.LINUX下线程与进程描述与区别

Linux下线程就是轻量级的进程，其进程描述符task_struct中tgid字段即为线程所属进程的pid，real_parent字段会指向所属进程的real_parent字段，thread_group链表字段会串起所有的线程。同时线程的地址空间mm、打开文件files、文件系统信息fs及信号处理sighand都会共用进程的。线程又分为用户态线程和内核态线程，用户态的线程栈是使用mmap在进程的堆空间分配的。内核线程使用clone系统调用创建时置CLONE_VM标志导致copy_mm()中将active_mm与mm赋值与父进程（所有内核线程的父进程都是kthreadd内核线程，而该内核线程又由swap内核线程创建，故而mm都是NULL）相同，避免复制页表。因为内核线程不会使用用户态地址，故而在运行时mm为NULL，active_mm指向上一个运行进程的active_mm。

7. 大致描述PAXOS过程及偶数台机器的处理

标准Paxos过程涉及三种角色proposer、accpetor以及learner，整个过程分为两大阶段：第一个阶段是选值，第二个阶段是同步，第一阶段中proposer提出编号为n的提案；accepetor在确认自己没有见过高于编号n的提案后回复对应proposer，否则将返回对应提案的值v，当proposer收到大多数的accepetor的确认信息后进入第二阶段；第二阶段中proposer如果收到大多数accepetor的提案值则只能选择该值，否则可以自由选值，并将该值同步到大多数accepetor，此时learner也将从大多数accepetor中学习到该值。paxos利用大多数原则保证一致性。

当因为偶数台出现无法选举成功时，可以先移除一台完成选举并检查网络问题。

8. WORDPRESS大致流程，其中URL固定链接如何实现的

分为admin与user两种大致逻辑。user起点是index.php并进入wp-blog-loader.php装载模型与视图而admin起点为wp-admin.php，之后进入wp-load.php加载wp-config.php初始化wp及wp-settings.php包括所有启用的plugins和主题的functions.php完成模型，之后uer使用template-loader.php完成视图，admin使用固定文件完成对应视图。

首先需要web服务器开启rewrite功能，将url统一定位到index.php，其后由WP接管负责处理url。整个过程如下：

a. wp初始化过程；b.在wp-setting.php中建立全局变量，特别是$wp和$wp_rewrite；c. 在classes.php中调用$wp的parse_request方法解析URL，该方法会处理$_SERVER['REUEST_URI']或$_SERVER['PATH_INFO']从而建立变量$request，然后使用$wp_rewrite的wp_rewrite_rules方法获取rewrite规则并逐条对$request进行匹配(preg_match)，如果找到一个匹配, 则建立相关的变量$perma_query_vars并最后保存在 $wp->query_vars里；其中rewrite规则会保存在数据库的option表中。主要参考这篇文章。

9. VFS大致构成介绍，其中 DENTRY结构如何形成， INODE结构如何形成

VFS为同一个目录结构下可能挂载的不同文件系统提供了统一接口，VFS提供了通用文件模型的多种对象内存结构包括file_system_type, super_block，inode，file，dentry以及进程相关结构包括files_struct及fs_struct，还有通用的页高速缓存page cache。

其中dentry结构在lookup操作发现dentry缓存中没有对应目录项时会从磁盘上读取目录内容找到对应目录项的内容（该操作为底层操作系统的lookup操作）并建立对应的dentry内存结构。

inode结构有三种形成方式，第一种是超级块载入时会将对应的inode读入并生成内存结构；第二种是mknod或create系统调用创建新的inode结构并标记为脏同步磁盘；第三种是lookup操作时目录项不在缓存中，完成dentry创建后还可能需要完成对应inode内存结构的创建并进行绑定。

10. JBD大致介绍及产生原因

JBD是一个日志子系统为ext系列文件系统提供一个独立的日志化系统，利用数据库中的日志技术实现WAL协议以加速e2fsck对系统一致性检查及系统一致性恢复的速度。JBD提供原子操作和事务两种抽象接口，前者保证多个低级操作构成的一个高级操作可以原子性的更新到文件系统，后者保证多个原子操作会原子性的提交到日志。JBD提供三种模式：journal模式会将数据及元数据都记入日志；ordered模式是默认模式会在元数据提交到日志前保证将对应数据写到磁盘对应位置；writeback模式只记录元数据到日志无视数据。

11. 什么是高端内存，在64G内存下又是什么情况

在32位系统中，线性地址空间是4G，其中内核规定3~4G的范围是内核空间，它映射到了物理地址的0~1G上.。然而实际上没有映射1G，只映射了896M。大于896M的物理地址即高端内存，是没有写死的页表来对应的，内核必须要建立映射才能访问它们。该机制可以满足1G的内核线性空间寻址大于1G的物理空间的要求，比如应付PAE下对64G物理空间的寻址。

如果机器内存不足896M，就不存在高端内存。如果是64位机器，因为可用地址空间很大，也不存在高端内存，其中属于内核的线性空间默认为2G，具体规划可见此文。

12. VM何时会产生，新建进程时已经刷新过页表之前为何还要分配VM

父进程通过fork创建子进程时使用copy_mm时会复制父进程的所有线性区（dup_mmap会遍历链表并进行复制），之后根据线性区情况填充对应子进程的页表。当可执行映象映射到进程的虚拟地址空间时，通过exec调用do_mmap也可能产生新的线性区。

进程建立vm_area_struct结构后，只是说明进程可以访问这个虚存空间，但有可能还没有分配相应的物理页面并建立好页面映射。在这种情况下，若是进程执行中有指令需要访问该虚存空间中的内存，便会产生一次缺页异常。这时候，就需要通过vm_area_struct结构里面的vm_ops->nopage所指向的函数来将产生缺页异常的地址对应的文件数据读取出来。

首先子进程需要继承父进程的地址空间，其次对于以后会调用exec的子进程也需要所有线性地址并标记为只读以确保能触发COW机制。

13. 如何实现线程安全

实现线程安全可以采用不共享状态变量，使用原子操作，或使用同步手段。

14.大致描述下LIRS算法的过程

所有缓存被分为冷区及热区，热区占用大部分缓存，另外准备一个空间用于存储最近被访问的块。当空间中一个块被命中时就会调至顶部，并通过操作移除空间底部的所有冷块直至底部是热块（这保证底部一定是热块且当该热块被移除后上面的冷块是没机会变成热块的）；当空间中冷块被访问到就会转换为热块，同时空间底部的热块就会变为冷块并移至冷区缓存顶部，注意此时该冷块不一定出现在冷区中；当冷区中的冷块命中而空间中没有命中时，只要将其移到到冷区顶部；当未命中时，就必须替换冷区缓存底部的块。注意在空间中的块不一定在缓存中。

15. 进程同步的方法，各有什么优缺点

很容易与进程间通信(IPC)混淆，进程间通信有以下多种：管道，socket，消息队列，信号，信号量，共享内存

而同步方法有：文件锁，信号量，管道+select等，socket+select等

IPC的各种方法中消息队列不需要同步控制，但需要两次拷贝且需要处理上次的残余消息。共享内存需要同步控制，通常和信号量共同使用。管道和socket使用了内核锁进行同步控制。

文件锁效率不如信号量，但文件锁使用简单且会在进程退出时自动释放。

16. STATIC变量与全局变量区别，DLL共享了哪些信息

声明了static的变量或函数就失去了全局可见性仅能在声明文件内被访问；全局变量和static变量都在所谓的“静态存储区”，会在开始运行时即完成初始化，默认为0；所以主要区别还是访问域的。

17.一个星期秒数的宏定义写法

#define WEEK_SEC (7*24*3600UL)，注意整形常量默认声明为int类型尽管32位下够用但声明为L兼容性更好，而秒数不可能为负数且可能由星期宏定义扩展为月或年故声明为UL更好

18. MYSQL有哪些存储引擎，MYISAM与INNODB有什么不同

MySQL有MyISAM，InnoDB，HEAP(内存)，BDB，CSV等

19. GFS的元数据控制与数据控制流

20. 什么是僵死进程，什么是孤儿进程，进程退出会自动释放哪些资源

由于允许父进程调用wait系统调用检查子进程退出情况，因此子进程运行结束后需要继续等待，此时通知父进程并进入EXIT_ZOMBIE状态；而如果父进程先于子进程退出，则会将孤儿子进程交由init进程管理，这样init进程可以使用wait系统调用检查子进程终止然后删除EXIT_ZOMBIE状态进程；所以僵死进程一定是父进程没有退出且没有调用wait进行处理的子进程，而孤儿进程是父进程先于子进程退出的进程且一定会被init妥善管理，所以前者多了会消耗资源而后者不会。

进程退出会向所有线程发出退出信号，自动关闭文件描述符，释放socket连接，存储映射区也会自动解除（关闭文件描述符并不能解除存储映射），释放文件上的锁，信号量仅在设置undo标志时才会自动回收。

2014-08-28

前端

Web Components生态圈现状

由Polymer的主要开发人员总结的Web Components生态圈现状，还是Polymer的各种广告。

2014-08-25

前端

使用data-*属性指南

本文内容参考自MDN的using-data-attributes-in-javascript-and-css一文

一篇2012年的文章，目前data-属性作为HTML5规范已经不能算是新特性了，主要是JSconf里看到troopjs使用，而boss问其他小伙伴时他们居然都不知道，于是就想总结一番。

data-属性使用起来也很直观：

<article id=”electriccars” data-columns=”3” data-indexnumber=”12314” data-parent=”cars”>…</article>

;

JS可以使用getAttribute方法获得data-属性的值，当然HTML5规范里也提供更为简便的方法:

var article = document.querySelector(‘#electriccars’),
data = article.dataset;

// data.columns -> “3”
// data.indexnumber -> “12314”
// data.parent -> “cars”

每个属性返回都是string类型，但是可以直接设置article.dataset.columns = 5 从而改变属性的值

大多数人不知道的是HTML5还规定了data-属性相关CSS接口，可以通过该属性值来填充内容:

article::before {
content: attr(data-parent);
}

你也可以在选择器中使用data-属性:

    article[data-columns=’3’]{
      width: 400px;
    }
    article[data-columns=’4’]{
      width: 600px;
    }

data-*属性的优势在于相关数据直接与dom节点绑定，css接口使用也挺方便，至于效率方面是否比window storage更高，留待以后测试。未来结合Web Component技术，倒是又可以大放异彩。

2014-08-10

前端

漫谈Web Components技术

第一次听说Web Components相关技术是在2013年从Shadow DOM 101这篇文章得知，只当是个模板技术的浏览器实现。谁想在Google IO 2014竟然将基于Web Components技术的Polymer专门做了个专题进行大篇幅介绍，其中Custom Element加上Imports的组合概念堪称惊艳，有种前端界的MapReduce的感觉。其实Google早在13年就已经花了部分精力对Web Components技术进行了介绍(链接是当时演讲里的ppt，部分协议标准已经产生了变化)，只是未像IO 14一样做专题介绍。

比如：

Live demo - 自己实现了一个，得用现代浏览器(推荐webkit内核)打开才行

使用Custom Element创建后可以通过HTML Import加载，然后直接使用

<x-gangnam-style></x-gangnam-style>

1. 规范现状

Web Components技术由W3C规范进行定义，实际由四种技术组成：

Custom ELements，可以创建自定义html标签元素，而且支持对现有html元素进行扩展，从而自定义新的API
HTML Template, 使用template标签包含html相关代码片段，作用行为基本类似于JS模板引擎里的视图(View)
Shadow DOM,，可以创建对外隐藏的web片段，关键是支持独立的样式，不会影响所在文档内的其他样式
HTML Imports, 提供对模板或者自定义元素这些资源的加载支持，可以像引用JS或CSS一样引用HTML组件
截至2014年8月17日，四个技术标准中HTML Template已经从Web Components规范中移除，并加入到HTML5标准中，这意味着该技术协议已经处于完成状态，不会再出现改动。Custom ELements也进入“Last Call”阶段，今后改动可能性也不大了。另外两个仍然存在修改的可能，因此下面的介绍只能当做谈资了解了解罢了。有关Web Components的规范现状与详细定义可以在这个链接中找到。

2. Custom ELements

原本Custom ELements设计是通过新增的element标签来实现自定义html标签元素，但由于种种问题，最终删去了相关规范。目前只能通过document.registerElement(tagName, prototype)方法来定义标签。该方法提供元素原型继承的接口，可以通过Object.create方法创建元素作为prototype传入。在这个gangnam-style示例中就使用：

var gangnamProto = Object.create(HTMLElement.prototype);

  gangnamProto.createdCallback = function(){
     // deal with element view create 
  }

  gangnamProto.attachedCallback = function(){
    // deal with event bind
  }

  document.registerElement("x-gangnam-style", {
    prototype: gangnamProto 
  });

接着就可以直接在html文件中使用x-gangnam-style标签了

3. HTML Template

该技术规范已经加入到HTML5规范中，成为一个新增的HTML5标签。使用非常简单，直接在html文件中添加template标签即可。该标签在浏览器中会被渲染处理，但不会展示，因此会比以前使用script做js的模板容器在效率上会高一些。需要注意的是获得template标签中的内容，需要使用其content属性，该属性返回documentFragment类型。而将content内容插入到目标元素中后会导致内容丢失，因此需要使用cloneNode方法进行拷贝保证其他元素仍然可以使用这个template中的content内容，或是使用document.importNode方法进行拷贝。在这个gangnam-style示例中就使用：

<template id="gangnam-style-tmpl">
<style>
...
</style>

<div id="maia-signature">
<div id="sig" class="zg-sig">
 <div id="psydroid" class="zg-psydroid">
 <img alt="" class="zg-psydroid-head" src="http://www.google.com/zeitgeist/2012/images/psydroid-head.png"> 
 <img alt="" class="zg-psydroid-body" src="http://www.google.com/zeitgeist/2012/images/psydroid-body.png"> 
 <img alt="" class="zg-psydroid-arm-l" src="http://www.google.com/zeitgeist/2012/images/psydroid-arm-l.png">
 <img alt="" class="zg-psydroid-arm-r" src="http://www.google.com/zeitgeist/2012/images/psydroid-arm-r.png">
 <img alt="" class="zg-psydroid-leg-l" src="http://www.google.com/zeitgeist/2012/images/psydroid-leg-l.png">
 <img alt="" class="zg-psydroid-leg-r" src="http://www.google.com/zeitgeist/2012/images/psydroid-leg-r.png">
 </div>
</div>
</div>
</template>
<script>
 ...  
gangnamProto.createdCallback = function(){     
   //deal with element view create     
   var thisDoc = document.currentScript.ownerDocument;  
   var tmpl = thisDoc.querySelector("#gangnam-style-tmpl");               
   //shadow.appendChild(tmpl.content.cloneNode(true)); also works!
   shadow.appendChild(document.importNode(tmpl.content, true)); 
} 
...  
</script>

这样就可以在Custom Element中使用template中已经定义好的html结构与样式。当然为保证接口与样式的封装性，这里还将HTML Template与Shadow DOM结合了起来。

4. Shadow DOM

这个技术规范堪称Web Components的灵魂所在，而且早已经用于HTML5标签技术中，只是不曾暴露接口给JS调用罢了。例如HTML5中的video标签就包含了Shadow DOM。现在可以在任何DOM节点上创建Shadow DOM，使用这个技术的好处在于Shadow DOM可以完全封装其中的dom结构与style样式，使得相关信息对用户透明。特别是包含在其中的样式，完全不会影响到其他元素，从而使得形成独立UI组件成为可能。Shadow DOM创建起来非常简单，调用createShadowRoot即可：

 gangnamProto.createdCallback = function(){
   …
   var shadow = this.createShadowRoot();
   shadow.appendChild(document.importNode(tmpl.content, true));
 }

同时Shadow DOM还提供<content>标签，该标签可以使用rel属性让浏览器选择性的填充其中的内容。（~~不知为何该标签在Chrome 36里失效~~ 实际没有失效，只是dev tool展示比较奇怪，让本人误会了）
为方便CSS样式进行Shadow DOM的匹配选择，该技术还提供了多个CSS相关选择符，包括::shadow，/deep/，::host，::host-context，::content等。
目前该技术仍然在修订中，所以还存在不少变数。

5. HTML Import

在前面将Web Component封装在一个HTML文件中后，如何引用该组件并重复运用就成为了问题，这时就需要使用HTML Import技术。该技术使用起来很方便，一句话即可：

<link rel=”import” href=”./x-gangnam-style.html”>

同时可以使用rel=import获得该link节点：

var link = document.querySelector(‘link[rel=import]’);

目前该技术也仍然在修订中，而且很多细节都在变更。

6. 总结

这些技术目前还没有浏览器完全支持，而且规范还在修改中，因此完全不存在工业界使用的可能性。但是这些技术其实早就在浏览器中不断使用，只是没有暴露相关接口而已，规范需要决定的只是怎么暴露，如何让开发者方便使用而已，因此这些Web Component技术的普及可谓是必然。当然，Web组件的最终形态肯定不是直接使用这些规范，多看看微软GUI组件的发展轨迹就可以知道基于新的Web Component规范的Web组件未来还有很长的路要走。

2014-07-19

产品设计

关于ipad蓝牙键盘设计

为手中的ipad买了个B.O.W航世蓝牙键盘，本意是方便写博客，但是实际使用后发现键盘设计问题颇多，还是样式设计大于实际功能设计，说白点就是好看不好用。

首先最大的问题是没有取消键，ipad默认拼音输入法带有联想输入功能，而本人在使用键盘输入时竟找不到取消输入功能，即通常键盘都有的esc键功能。该键盘对应位置放了个home键，习惯性的按下去就会退回到ipad主页，直接切出了app程序。琢磨了半天也没发现对应esc的按键在哪里，如今也只能用空格键＋backspace键代替，需要多按一下，好不尴尬。

其次是没有导航相关按键的绑定，即不能用键盘来浏览ipad页面上放置的app，还得自己手进行滑动并点击才能选择。当然这个可能是IOS系统本身就没有提供对应的API，不过键盘如果支持的话更符合用户的要求。

然后是键盘上很多按键设计得莫名奇妙，看图标样式完全不知道是干什么用的，都试过一遍后发现十几个键里也只有一半有用，其他完全没有绑定IOS操作。

使用后感觉Ipad蓝牙键盘只能用来满足输入要求，基本就是虚拟键盘都完全替代品，对方便操作没有什么帮助。这篇文章就是使用该ipad蓝牙键盘输入的，如果有什么错误，那都是联想输入法害的，谁要我怎么也无法在键盘上找到esc取消键呢。

new! 使用ipad进行输入的另一个麻烦是复制粘贴很不方便，选择部分文字还是得使用本人的fat finger在屏幕上不断乱点，n多次会失败，麻烦的很。而且多个app间进行切换简直就是灾难。不懂ipad为何不能设计像surface pro一样切屏显示。

2013-12-25

后端语言

Why [Programming Language X] Is Unambiguously Better than [Programming Language Y]

转载神文一篇如下：

Recently I have seen a lot of people wondering about the difference between [X] and [Y]. After all, they point out, both are [paradigm] languages that target [platform] and encourage the [style] style of programming while leaving you enough flexibility to [write shitty code].

Having written [simple program that’s often asked about in phone screens] in both languages, I think I’m pretty qualified to weigh in. I like to think about it in the following way: imagine [toy problem that you might give to a 5th grader who is just learning to program]. A [Y] implementation of it might look like this:

[really poorly engineered Y code]

Whereas in [X] you could accomplish the same thing with just

[slickly written X code that shows off syntactic sugar]

It’s pretty clear that the second is easier to understand and less error-prone.

Now consider type systems. [Religious assertion about the relative merits and demerits of static and dynamic typing.] Sure, [Y] gives you [the benefit of Y’s type system or lack thereof] but is this worth [the detriment of Y’s type system or lack thereof]? Obviously not!

Additionally, consider build tools. While [Y] uses [tool that I have never bothered to understand], [X] uses the far superior [tool that I marginally understand]. That’s reason enough to switch!

Finally, think about the development process. [X] has the amazing [X-specific IDE that’s still in pre-alpha], and it also integrates well with [text-editor that’s like 50 years old and whose key-bindings are based on Klingon] and [IDE that everyone uses but that everyone hates]. Sure, you can use [Y] with some of these, but it’s a much more laborious and painful process.

In conclusion, while there is room for polyglotism on the [platform] platform, we would all be well served if you [Y] developers would either crawl into a hole somewhere or else switch to [X] and compete with us for the handful of [X] jobs. Wait, never mind, [Y] is awesome!

2013-09-15

系统经验

Xshell及Vim中文乱码解决

首先要保证Linux本地语言设置，主要是LC_ALL的设置（该设置会覆盖其他设置），可以使用locale查看。临时修改使用export LC_ALL=”zh_CN.UTF-8”，永久修改需要修改文件/etc/profile增加或修改export LC_ALL=zh_CN.UTF-8，修改完后无需重启。

其次要保证Xshell的连接编码使用utf-8而非默认编码。最后设置vim修改~/.vimrc增加set fileencodings=utf-8,gbk一行

2013-09-05

课程总结

Cracking the coding interview Q19-20

19.1 Write a function to swap a number in place without temporary variables

x=x^y y=x^y x=x^y

**19.2 Design an algorithm to figure out if someone has won in a game of tic-tac-toe** We could use 1 for player1 and 2 for player2 and sum caculation to judge if any player wins or we can simply go though all directions to judge after any player make a move. **19.3 Write an algorithm which computes the number of trailing zeros in n factorial** Just caculate how many 5 are there in factors as 2 is usualy enough and be careful that 25 has two 5 rather than one.

int NumZeros(int n){
    if(n < 0) return -1;
    int num = 0;
    while((n /= 5) > 0){ //deal with factor like 25
        num += n;
    }
    return num;
}

19.4 Write a method which finds the maximum of two numbers. You should not use if-else or any other comparison operator. EXAMPLE Input: 5, 10 Output: 10

int mmax(int a, int b){
    int c = a - b;
    int k = (c>>31) & 0x01; //sign bit
    return a - k*c; //marvelous
}

19.5 The Game of Master Mind is played as follows:
The computer has four slots containing balls that are red (R), yellow (Y), green (G) or blue (B). For example, the computer might have RGGB (e.g., Slot #1 is red, Slots #2 and #3 are green, Slot #4 is blue)
You, the user, are trying to guess the solution. You might, for example, guess YRGB When you guess the correct color for the correct slot, you get a “hit”. If you guess a color that exists but is in the wrong slot, you get a “pseudo-hit”. For example, the guess YRGB has 2 hits and one pseudo hit. For each guess, you are told the number of hits and pseudo-hits Write a method that, given a guess and a solution, returns the number of hits and pseudo hits.

The key is to figure out if any color could be judge multiple times. If not, which is a harder problem, go though the guess and the solution in the same time and judge hit first then increase corresponding color in solution’s count if not hit and reduce color in guess’s count, remember if count is ever bigger or equal to zero then that leads to a pseudo hit count.

19.6 Given an integer between 0 and 999,999, print an English phrase that describes the integer (eg, “One Thousand, Two Hundred and Thirty Four”)

Firstly check zero and then check thousand using length and be careful with eleven to nineteen.

19.7 You are given an array of integers (both positive and negative). Find the continuous sequence with the largest sum. Return the sum EXAMPLE Input: {2, -8, 3, -2, 4, -10} Output: 5 (i.e. , {3, -2, 4} )

It’s a classic dp problem and dp[i] = max{dp[i-1]+k[i], k[i]}

19.8 Design a method to find the frequency of occurrences of any given word in a book

Using tire tree (or hash) and count every word.

19.9 Since XML is very verbose, you are given a way of encoding it where each tag gets mapped to a pre-defined integer value. The language/grammar is as follows:
Element –> Element Attr* END Element END [aka, encode the element tag, then its attributes, then tack on an END character, then encode its children, then another end tag]
Attr –> Tag Value [assume all values are strings]
END –> 01
Tag –> some predefined mapping to int
Value –> string value END
Write code to print the encoded version of an xml element (passed in as string)
FOLLOW UP
Is there anything else you could do to (in many cases) compress this even further?

19.10 Write a method to generate a random number between 1 and 7, given a method that generates a random number between 1 and 5(i.e., implement rand7() using rand5()). FOLLOW use randAB() to generate randCD()

def rand7():
         ret = rand5()+rand5()
         if ret > 7:
            return rand7()
         return ret
     # A is bigger than B
     def randB():
         ret = randA()
         if(ret > B*(A/B))
            return randB()
         return ret%B+1

19.11 Design an algorithm to find all pairs of integers within an array which sum to a specified value.

Firstly sort and then go though the array from the lowest and highest to judge if their sum equals to a specified value.

20.1 Write a function that adds two numbers. You should not use + or any arithmetic operators.

int add(int a, int b){
    while(b != 0){
        int sum = a^b; // (0,1)->1 (1,1)->0 (0,0)->0
        int carry = (a & b)<<1; and="" shift="" left="" to="" generate="" carry="" a="sum;" b="carry;" }="" return="" a;="" better="" solution="" int="" add(int="" a,="" b){="" char="" *c="(char" *)a;="" convert="" address="" (int)&c[b];="" using="" array="" add="" <="" pre="">

20.2 Write a method to shuffle a deck of cards. It must be a perfect shuffle in other words, each 52! permutations of the deck has to be equally likely. Assume that you are given a random number generator which is perfect.  
Randomly pick up a card from total and remove it in the next turn, code like build heap.
def shuffle(cards):
    for i in range(0, len(cards)):
        j= random() % (len(cards)-i) + i #key step
        cards[i],cards[j]=cards[j],cards[i]
    return cards


20.3 Write a method to randomly generate a set of m integers from an array of size n. Each element must have equal probability of being chosen
almost the same as the upper question.
20.4 Write a method to count the number of 2s between 0 and n.
check every factor and count by caculation.
20.5 You have a large text file containing words. Given any two words, find the shortest distance (in terms of number of words) between them in the file. Can you make the searching operation in O(1) time? What about the space complexity for your solution?
It must be solved by using hash table, yet it should be build by traverse the file and record every two words’ min distance and hash key is based on the sum of words and takes O(n^2) time and space to build that hash table. We assume that word order does not matter.
20.6 Describe an algorithm to find the largest 1 million numbers in 1 billion numbers. Assume that the computer memory can hold all one billion numbers
It’s easy to do with the min-heap and replacement algorithm (STL partial sort) or use O(n) partition algorithm (kth biggest elem) as all datas can be hold in memory.
20.7 Write a program to find the longest word made of other words in a list of words EXAMPLE Input: test, tester, testertest, testing, testingtester Output: testingtester
Sort the words from long to short by length firstly and check every word’s prefix and rest letters if they appears in the words array and we could use hash table to optimize the check so the whole takes O(nlgn)+O(n1d) time.
20.8 Given a string s and an array of smaller strings T, design a method to search s for each small string in T
Using tire tree yet you must know how to build one. We can build one based on suffix of the string s (taking O(m^2) time and space to build the tree and judge in kO(n) time) or better we can construct an AC automachine（taking O(m) time to build and kO(n) + O(s) time to judge）
20.9 Numbers are randomly generated and passed to a method. Write a program to find and maintain the median value as new values are generated 
We can build max-heap on the first half and min-heap on the second half then it takes O(1) to get median and O(nlgn) to maintain. In fact, we only need four（or five） numbers (min and max in two halfs) to get median and maintain.
20.10 Given two words of equal length that are in a dictionary, write a method to transform one word into another word by changing only one letter at a time. The new word you get in each step must be in the dictionary EXAMPLE Input: DAMP, LIKE Output: DAMP -> LAMP -> LIMP -> LIME -> LIKE
It could be solved by BFS and trace back method.
20.11 Imagine you have a square matrix, where each cell is filled with either black or white. Design an algorithm to find the maximum subsquare such that all four borders are filled with black pixels
It ask to find the biggest subsquare in black, just go though the square matrix and check. Remember we are deal with the square thus every line length should be equal so if col is smaller than current max then it’s impossible to find a bigger subsquare and we can stop traverse.
20.12 Given an NxN matrix of positive and negative integers, write code to find the submatrix with the largest possible sum  
We can combine row i to j and form a new one-dimension array. It takes O(n) time to find largest sum in a one-dimension array and we get n^2 that kind of arrays in all, thus takes O(n^3) time as a whole.
20.13 Given a dictionary of millions of words, give an algorithm to find the largest possible rectangle of letters such that every row forms a word (reading left to right) and every column forms a word (reading top to bottom)
It ask to form a largest matrix of words so all the chosen words must be in the same length and we could use tire tree to judge prefix of the colums’ words ever exists.

2013-09-03

面试总结

面试题总结：子数组问题

公用题干为一个包含正数及负数的数组k

1.找到连续的子数组和最大，最小，最接近m（比如0）

最大的dp方程为dp[i]=max{dp[i-1]+k[i], k[i]}，最小情况完全类似

def submax(k[]):
    smax=MIN, cmax=0
    for i in range(0,len(k)-1):
        cmax=max(cmax+k[i], k[i])
        smax=max(smax, cmax)
    return smax

最接近m的处理方法，首先考虑特殊的m比如0，此时处理办法：首先用数组sum[0]=0, sum[i+1]=a[0]+a[1]+...+a[i]即包含第一个元素的元素和（这些元素相减可以构成子数组和）进行排序，然后遍历该数组并找到绝对值最小的两个相邻元素。所以对于m也可以采取同样的办法，遍历寻找与m值差绝对值最小的两个相邻元素即可。时间复杂度为O(nlgn)，空间复杂度为O(n)

def subsum(k[], m):
    sums = []
    sums[0]=0
    for i in range(0, len(k)):
        sums[i+1] = sums[i]+k[i]
    sums.sort()
    smin=m, cmin=0
    for i in range(1,len(k)+1):
        cmin=abs(sums[i]-sums[i-1]-m)
        smin=min(smin, cmin)
    return smin

2.找到几个子数组和最大，最小，最接近m

几个子数组和最大的dp方程为dp[i][j]=max{dp[i][j-1]+k[j], max{dp[i-1]p}+k[j]}，即考察k[j]是包含在第i组里面，还是再单独成组。最小情况完全类似。

找到n个数的和最接近m，我们仅考虑小于m且这里不考虑存在负数的情况（因为需要遍历可能的和值），使用类似0-1背包的方法仅有的限制是背包重量不能超过m，时间复杂度为O(n^2*m)

def subsum(k[], m):
#dp[j][p]代表是否找到j个数使得和等于p
dp =[[0 for j in range(m+2)] for i in range(len(k)+1)]
for i in range(1, len(k)+1):
        for j in range(1, m):
               dp[i][j]=max(dp[i-1][j], dp[i-1][j-k[j]]+k[j])
return dp[len(k)][m+1]

3.分割为两个子数组和最接近，添加正负号使结果最接近0

假设2n的正整数数组和为SUM，则问题变为找到n个数和最接近SUM/2，接近SUM/2就可能大于或小于SUM/2，我们仅考虑小于SUM/2且这里不考虑存在负数的情况（因为需要遍历可能的和值），使用类似0-1背包的方法有两个限制：一个是数目等于n，另一个是背包重量不能超过SUM/2，时间复杂度为O(n^2*m)

def subsum(k[], m):

#dp[2n][n][m/2]
dp =[[[0 for p in range(m/2+2)] for j in range(len(k)/2+1)] for i in range(len(k)+1)]
for i in range(1, len(k)+1):
for j in range(1, min(i, len(k)/2)+1):
for p in range(m/2+1, k[j]+1, -1):
dp[i][j][p] = max(dp[i-1][j][p], dp[i-1][j-1][p-k[i]]+k[i])

#进行回溯寻找对应元素
i=len(k),j=len(k)/2, p=m/2+1
for i in range(len(k), 0, -1):
if(dp[i][j][p]==dp[i-1][j-1][p-k[j]]+k[j]):
print k[i]
–j, p-=k[i]
return dp[len(k)][len(k)/2][m/2+1]

#简化算法，只能求出最接近的值
def subsum(k[], m):

#ok[j][p]代表是否找到j个数使得和等于p，初始化为false
ok = [[false for col in range(m/2+1)] for row in range(len(k)/2+1)]
ok[0][0] = true
for i in range(1, len(k)+1):
for j in range(1, min(i,len(k)/2)+1):
for p in range(m/2+1, k[j]+1, -1):
if ok[j-1][i-k[i]]:
ok[j][p]=true

#找到最接近的值
for i in range(m/2+1, -1, -1):
if ok[len(k)/2)][i]:
return i
如果存在负数，则找到最小的那个负数并将其绝对值加到每个元素上，然后重新求SUM/2

4.找到所有和等于m的两个数，三个数，n个数

对于两个数的情况可以使用排序扫描的方法，或是排序再二分，或是使用hash表查找

对于三个数的情况同样使用排序扫描的方法，扫描固定一个值，然后双扫描确定另外两个值，时间复杂度为O(n^2)

n个数的情况和第2问中的最接近m方法相同，必须使用0-1背包

5.连续的子数组和最大扩展到二维，n维，循环数组等情况

对于二维可以枚举另一维的子数组区域，时间复杂度为O(n^2)，而降维后只需O(m)即可得到答案，故而总时间复杂度为O(n^2*m)

数组首尾相连可以分为原问题和数组跨界两种情况，后者可以采用数组所有元素总值减去最小子数组和的方法来解决。

6. 在两个数组中找到两个数和等于m，三个数组中找三个数，n个数组的情况

2013-09-02

系统架构

对象存储文件系统分布式日志设计与实现

背景

为MDS和OSD缓存构建日志系统以加强系统可用性与一致性。当MDS与OSD使用缓存时，为防止突然的故障导致的内存数据丢失，需要使用日志技术以进行数据恢复

任务

日志系统支持文件系统配置日志和文件系统操作更新日志（Redo日志），并在MDS端支持分布式事务

设计

深入研究EXT3文件系统（JBD）及Innodb（MySQL存储引擎）日志系统源码及设计。

更新日志功能（Redo日志）结合缓存以满足WAL协议，在写缓存前先写日志，日志一直进行顺序写，检查点操作保证空间进行循环使用。

配置日志功能加强客户端请求缓存的重新连接发送

分布式事务主要是MDS端的两阶段提交
关键点：日志系统与缓存结合，异步增量检查点，分布式两阶段提交