2013-09-15

Xshell及Vim中文乱码解决

首先要保证Linux本地语言设置，主要是LC_ALL的设置（该设置会覆盖其他设置），可以使用locale查看。临时修改使用export LC_ALL=”zh_CN.UTF-8”，永久修改需要修改文件/etc/profile增加或修改export LC_ALL=zh_CN.UTF-8，修改完后无需重启。

其次要保证Xshell的连接编码使用utf-8而非默认编码。最后设置vim修改~/.vimrc增加set fileencodings=utf-8,gbk一行

2013-09-05

课程总结

Cracking the coding interview Q19-20

19.1 Write a function to swap a number in place without temporary variables

x=x^y y=x^y x=x^y

**19.2 Design an algorithm to figure out if someone has won in a game of tic-tac-toe** We could use 1 for player1 and 2 for player2 and sum caculation to judge if any player wins or we can simply go though all directions to judge after any player make a move. **19.3 Write an algorithm which computes the number of trailing zeros in n factorial** Just caculate how many 5 are there in factors as 2 is usualy enough and be careful that 25 has two 5 rather than one.

int NumZeros(int n){
    if(n < 0) return -1;
    int num = 0;
    while((n /= 5) > 0){ //deal with factor like 25
        num += n;
    }
    return num;
}

19.4 Write a method which finds the maximum of two numbers. You should not use if-else or any other comparison operator. EXAMPLE Input: 5, 10 Output: 10

int mmax(int a, int b){
    int c = a - b;
    int k = (c>>31) & 0x01; //sign bit
    return a - k*c; //marvelous
}

19.5 The Game of Master Mind is played as follows:
The computer has four slots containing balls that are red (R), yellow (Y), green (G) or blue (B). For example, the computer might have RGGB (e.g., Slot #1 is red, Slots #2 and #3 are green, Slot #4 is blue)
You, the user, are trying to guess the solution. You might, for example, guess YRGB When you guess the correct color for the correct slot, you get a “hit”. If you guess a color that exists but is in the wrong slot, you get a “pseudo-hit”. For example, the guess YRGB has 2 hits and one pseudo hit. For each guess, you are told the number of hits and pseudo-hits Write a method that, given a guess and a solution, returns the number of hits and pseudo hits.

The key is to figure out if any color could be judge multiple times. If not, which is a harder problem, go though the guess and the solution in the same time and judge hit first then increase corresponding color in solution’s count if not hit and reduce color in guess’s count, remember if count is ever bigger or equal to zero then that leads to a pseudo hit count.

19.6 Given an integer between 0 and 999,999, print an English phrase that describes the integer (eg, “One Thousand, Two Hundred and Thirty Four”)

Firstly check zero and then check thousand using length and be careful with eleven to nineteen.

19.7 You are given an array of integers (both positive and negative). Find the continuous sequence with the largest sum. Return the sum EXAMPLE Input: {2, -8, 3, -2, 4, -10} Output: 5 (i.e. , {3, -2, 4} )

It’s a classic dp problem and dp[i] = max{dp[i-1]+k[i], k[i]}

19.8 Design a method to find the frequency of occurrences of any given word in a book

Using tire tree (or hash) and count every word.

19.9 Since XML is very verbose, you are given a way of encoding it where each tag gets mapped to a pre-defined integer value. The language/grammar is as follows:
Element –> Element Attr* END Element END [aka, encode the element tag, then its attributes, then tack on an END character, then encode its children, then another end tag]
Attr –> Tag Value [assume all values are strings]
END –> 01
Tag –> some predefined mapping to int
Value –> string value END
Write code to print the encoded version of an xml element (passed in as string)
FOLLOW UP
Is there anything else you could do to (in many cases) compress this even further?

19.10 Write a method to generate a random number between 1 and 7, given a method that generates a random number between 1 and 5(i.e., implement rand7() using rand5()). FOLLOW use randAB() to generate randCD()

def rand7():
         ret = rand5()+rand5()
         if ret > 7:
            return rand7()
         return ret
     # A is bigger than B
     def randB():
         ret = randA()
         if(ret > B*(A/B))
            return randB()
         return ret%B+1

19.11 Design an algorithm to find all pairs of integers within an array which sum to a specified value.

Firstly sort and then go though the array from the lowest and highest to judge if their sum equals to a specified value.

20.1 Write a function that adds two numbers. You should not use + or any arithmetic operators.

int add(int a, int b){
    while(b != 0){
        int sum = a^b; // (0,1)->1 (1,1)->0 (0,0)->0
        int carry = (a & b)<<1; and="" shift="" left="" to="" generate="" carry="" a="sum;" b="carry;" }="" return="" a;="" better="" solution="" int="" add(int="" a,="" b){="" char="" *c="(char" *)a;="" convert="" address="" (int)&c[b];="" using="" array="" add="" <="" pre="">

20.2 Write a method to shuffle a deck of cards. It must be a perfect shuffle in other words, each 52! permutations of the deck has to be equally likely. Assume that you are given a random number generator which is perfect.  
Randomly pick up a card from total and remove it in the next turn, code like build heap.
def shuffle(cards):
    for i in range(0, len(cards)):
        j= random() % (len(cards)-i) + i #key step
        cards[i],cards[j]=cards[j],cards[i]
    return cards


20.3 Write a method to randomly generate a set of m integers from an array of size n. Each element must have equal probability of being chosen
almost the same as the upper question.
20.4 Write a method to count the number of 2s between 0 and n.
check every factor and count by caculation.
20.5 You have a large text file containing words. Given any two words, find the shortest distance (in terms of number of words) between them in the file. Can you make the searching operation in O(1) time? What about the space complexity for your solution?
It must be solved by using hash table, yet it should be build by traverse the file and record every two words’ min distance and hash key is based on the sum of words and takes O(n^2) time and space to build that hash table. We assume that word order does not matter.
20.6 Describe an algorithm to find the largest 1 million numbers in 1 billion numbers. Assume that the computer memory can hold all one billion numbers
It’s easy to do with the min-heap and replacement algorithm (STL partial sort) or use O(n) partition algorithm (kth biggest elem) as all datas can be hold in memory.
20.7 Write a program to find the longest word made of other words in a list of words EXAMPLE Input: test, tester, testertest, testing, testingtester Output: testingtester
Sort the words from long to short by length firstly and check every word’s prefix and rest letters if they appears in the words array and we could use hash table to optimize the check so the whole takes O(nlgn)+O(n1d) time.
20.8 Given a string s and an array of smaller strings T, design a method to search s for each small string in T
Using tire tree yet you must know how to build one. We can build one based on suffix of the string s (taking O(m^2) time and space to build the tree and judge in kO(n) time) or better we can construct an AC automachine（taking O(m) time to build and kO(n) + O(s) time to judge）
20.9 Numbers are randomly generated and passed to a method. Write a program to find and maintain the median value as new values are generated 
We can build max-heap on the first half and min-heap on the second half then it takes O(1) to get median and O(nlgn) to maintain. In fact, we only need four（or five） numbers (min and max in two halfs) to get median and maintain.
20.10 Given two words of equal length that are in a dictionary, write a method to transform one word into another word by changing only one letter at a time. The new word you get in each step must be in the dictionary EXAMPLE Input: DAMP, LIKE Output: DAMP -> LAMP -> LIMP -> LIME -> LIKE
It could be solved by BFS and trace back method.
20.11 Imagine you have a square matrix, where each cell is filled with either black or white. Design an algorithm to find the maximum subsquare such that all four borders are filled with black pixels
It ask to find the biggest subsquare in black, just go though the square matrix and check. Remember we are deal with the square thus every line length should be equal so if col is smaller than current max then it’s impossible to find a bigger subsquare and we can stop traverse.
20.12 Given an NxN matrix of positive and negative integers, write code to find the submatrix with the largest possible sum  
We can combine row i to j and form a new one-dimension array. It takes O(n) time to find largest sum in a one-dimension array and we get n^2 that kind of arrays in all, thus takes O(n^3) time as a whole.
20.13 Given a dictionary of millions of words, give an algorithm to find the largest possible rectangle of letters such that every row forms a word (reading left to right) and every column forms a word (reading top to bottom)
It ask to form a largest matrix of words so all the chosen words must be in the same length and we could use tire tree to judge prefix of the colums’ words ever exists.

2013-09-03

面试总结

面试题总结：子数组问题

公用题干为一个包含正数及负数的数组k

1.找到连续的子数组和最大，最小，最接近m（比如0）

最大的dp方程为dp[i]=max{dp[i-1]+k[i], k[i]}，最小情况完全类似

def submax(k[]):
    smax=MIN, cmax=0
    for i in range(0,len(k)-1):
        cmax=max(cmax+k[i], k[i])
        smax=max(smax, cmax)
    return smax

最接近m的处理方法，首先考虑特殊的m比如0，此时处理办法：首先用数组sum[0]=0, sum[i+1]=a[0]+a[1]+...+a[i]即包含第一个元素的元素和（这些元素相减可以构成子数组和）进行排序，然后遍历该数组并找到绝对值最小的两个相邻元素。所以对于m也可以采取同样的办法，遍历寻找与m值差绝对值最小的两个相邻元素即可。时间复杂度为O(nlgn)，空间复杂度为O(n)

def subsum(k[], m):
    sums = []
    sums[0]=0
    for i in range(0, len(k)):
        sums[i+1] = sums[i]+k[i]
    sums.sort()
    smin=m, cmin=0
    for i in range(1,len(k)+1):
        cmin=abs(sums[i]-sums[i-1]-m)
        smin=min(smin, cmin)
    return smin

2.找到几个子数组和最大，最小，最接近m

几个子数组和最大的dp方程为dp[i][j]=max{dp[i][j-1]+k[j], max{dp[i-1]p}+k[j]}，即考察k[j]是包含在第i组里面，还是再单独成组。最小情况完全类似。

找到n个数的和最接近m，我们仅考虑小于m且这里不考虑存在负数的情况（因为需要遍历可能的和值），使用类似0-1背包的方法仅有的限制是背包重量不能超过m，时间复杂度为O(n^2*m)

def subsum(k[], m):
#dp[j][p]代表是否找到j个数使得和等于p
dp =[[0 for j in range(m+2)] for i in range(len(k)+1)]
for i in range(1, len(k)+1):
        for j in range(1, m):
               dp[i][j]=max(dp[i-1][j], dp[i-1][j-k[j]]+k[j])
return dp[len(k)][m+1]

3.分割为两个子数组和最接近，添加正负号使结果最接近0

假设2n的正整数数组和为SUM，则问题变为找到n个数和最接近SUM/2，接近SUM/2就可能大于或小于SUM/2，我们仅考虑小于SUM/2且这里不考虑存在负数的情况（因为需要遍历可能的和值），使用类似0-1背包的方法有两个限制：一个是数目等于n，另一个是背包重量不能超过SUM/2，时间复杂度为O(n^2*m)

def subsum(k[], m):

#dp[2n][n][m/2]
dp =[[[0 for p in range(m/2+2)] for j in range(len(k)/2+1)] for i in range(len(k)+1)]
for i in range(1, len(k)+1):
for j in range(1, min(i, len(k)/2)+1):
for p in range(m/2+1, k[j]+1, -1):
dp[i][j][p] = max(dp[i-1][j][p], dp[i-1][j-1][p-k[i]]+k[i])

#进行回溯寻找对应元素
i=len(k),j=len(k)/2, p=m/2+1
for i in range(len(k), 0, -1):
if(dp[i][j][p]==dp[i-1][j-1][p-k[j]]+k[j]):
print k[i]
–j, p-=k[i]
return dp[len(k)][len(k)/2][m/2+1]

#简化算法，只能求出最接近的值
def subsum(k[], m):

#ok[j][p]代表是否找到j个数使得和等于p，初始化为false
ok = [[false for col in range(m/2+1)] for row in range(len(k)/2+1)]
ok[0][0] = true
for i in range(1, len(k)+1):
for j in range(1, min(i,len(k)/2)+1):
for p in range(m/2+1, k[j]+1, -1):
if ok[j-1][i-k[i]]:
ok[j][p]=true

#找到最接近的值
for i in range(m/2+1, -1, -1):
if ok[len(k)/2)][i]:
return i
如果存在负数，则找到最小的那个负数并将其绝对值加到每个元素上，然后重新求SUM/2

4.找到所有和等于m的两个数，三个数，n个数

对于两个数的情况可以使用排序扫描的方法，或是排序再二分，或是使用hash表查找

对于三个数的情况同样使用排序扫描的方法，扫描固定一个值，然后双扫描确定另外两个值，时间复杂度为O(n^2)

n个数的情况和第2问中的最接近m方法相同，必须使用0-1背包

5.连续的子数组和最大扩展到二维，n维，循环数组等情况

对于二维可以枚举另一维的子数组区域，时间复杂度为O(n^2)，而降维后只需O(m)即可得到答案，故而总时间复杂度为O(n^2*m)

数组首尾相连可以分为原问题和数组跨界两种情况，后者可以采用数组所有元素总值减去最小子数组和的方法来解决。

6. 在两个数组中找到两个数和等于m，三个数组中找三个数，n个数组的情况

2013-09-02

系统架构

对象存储文件系统分布式日志设计与实现

背景

为MDS和OSD缓存构建日志系统以加强系统可用性与一致性。当MDS与OSD使用缓存时，为防止突然的故障导致的内存数据丢失，需要使用日志技术以进行数据恢复

任务

日志系统支持文件系统配置日志和文件系统操作更新日志（Redo日志），并在MDS端支持分布式事务

设计

深入研究EXT3文件系统（JBD）及Innodb（MySQL存储引擎）日志系统源码及设计。

更新日志功能（Redo日志）结合缓存以满足WAL协议，在写缓存前先写日志，日志一直进行顺序写，检查点操作保证空间进行循环使用。

配置日志功能加强客户端请求缓存的重新连接发送

分布式事务主要是MDS端的两阶段提交
关键点：日志系统与缓存结合，异步增量检查点，分布式两阶段提交

2013-09-02

系统架构

高效对象存储服务器缓存设计

背景

在OSD端实现读写缓存以优化服务器性能，提升IO性能

任务

设计并实现数据缓存模块，包括缓存存储结构的设计，提供缓存刷新与替换机制及预读机制

设计

参考Linux内核使用基树构建读写混合缓存，整个缓存空间基于一个大的内存池，该内存池使用buddy算法进行管理，读缓存方面实现LIRS算法取代LRU算法以提高缓存命中率，写缓存方面使用异步写回策略（做IO聚合），一定时间需要进行赃页回收以给读缓存热区预留空间。

关于LIRS，主要是LRU算法存在以下问题：

1, 顺序扫描。顺序扫描的情况下LRU没有命中情况，而且会淘汰其它将要被访问的item从而污染cache。

2, 循环的数据集大于缓存大小。如果循环访问且数据集大于缓存大小，那么没有命中情况。

LIRS将数据分为两部分：LIR热区和HIR冷区，第一级是HIR，第二级是LIR，数据先进入到第一级，当数据在较短的时间内被访问两次时成为热点数据则进入LIR，HIR和LIR内部都采用LRU策略。这样，LIR中的数据比较稳定，解决了LRU的上述两个问题。除Recency外引入与locality相关的的IRR概念，即使用最近两次访问之间其他唯一块的数目。该数目体现了再次出现的可能性（越大越不可能再次出现）使用该概念后的LIRS最有可能将最新的访问缓存起来并在下次访问剔除，但是依旧使用recency进行缓存内的调整。

在设计时使用一个栈来剔除recency较大且HIR的块以方便考察（在栈中的HIR块才可能变为LIR块，相应在栈中Recency最大的LIR块被出栈并转化为HIR块，注意这些转化并未导致被剔除出缓存，被剔除缓存的一定是HIR块而无论其是否在栈中）工程实现时无论是栈还是队列都使用list将块链起来实现即可，mysql ndb也是这么干的。每次决定是否将一个HIR块加入到LIR的时候都需要往栈里扫描一遍，这个时间开销是O(c)

优化、难点

读写混合缓存与LIRS算法的冲突的问题采用回收线程定期回收赃页进行改善。多线程并发时有时使用读写锁效率不如互斥锁，原因在前文已述，且多线程调试充满困难。malloc在小文件频繁申请上损耗较多，所以采用内存池解决。

关键点：内存空间管理，多线程同步，缓存替换算法，异步写回

chaozh.com

超然台上仙的知识库|系统架构,移动开发,独立游戏